On Wed, May 29, 2013 at 07:55:22AM -0400, Stewart C. Russell wrote: > On 13-05-29 04:32 AM, Michal Palenik wrote: > > > > what would be the easiest option to connect misspelled names to their > > properlyspelled counterparts? > > How are your programming skills? The classic way of doing this is using > an approximate string match (or "fuzzy match") using the Levenshtein or
thanks for the proper keyword http://www.postgresql.org/docs/9.1/static/fuzzystrmatch.html ... where levenshtein_less_equal(name1, name2, 1) <= 1 ... (for slovak language those other 2 did not make sense) was successful, kind of... (still had to decide if "Marin" is misspelling of city "Martin" or some village 2000+ km away) the distance measurment would improve a lot, if layout of the keyboard would be considered (on qwerty keyboard, r->t change is more probable than a->p) and exchange of letters would be punished less (eg levenshtein('extralongword', 'extralognword') vs levenshtein('extralongword', 'extralohjword') both return 2 even though the first one is in human terms more likely) i've tried to read the sourcecode http://doxygen.postgresql.org/levenshtein_8c.html#a3887230c68a3fee3cb0cc496614468eb but my C skills are definitely not at that level... anyway, using levenshtein/fuzzymatch in nominatim would probably be a great performance hit. michal > Damerau-Levenshtein methods. There are modules to do this for many > scripting languages (like Text::Fuzzy in Perl). There is also the > command line tool 'agrep' which does the same thing. > > I'd recommend you manually check the results. I know it's slow, but > there's no way to get this perfectly right automatically. > > cheers, > Stewart > > > _______________________________________________ > Geocoding mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/geocoding -- michal palenik institut zamestnanosti www.iz.sk _______________________________________________ Geocoding mailing list [email protected] http://lists.openstreetmap.org/listinfo/geocoding

