Dear All Friends, Thank you very much for all the information.
I will spend time to digest and think over all these. It is certainly a very interest topic. Regards, Shao On Mon, 11 May 2020 at 18:34, James Klassen <[email protected]> wrote: > There are lots of classes of tricky cases: > > For example: > > 1234 County Road E, Hudson > > Is actually "County Road E", not "County Road East". The county roads > there are indexed by letters. There is also a County Road F, and a County > Road N, and a County Road FF, etc.) Last I checked, even Google, with all > their data, gets this one wrong. > > In general, address parsing is a hard problem and there are always (at > least in the US) going to be cases that are ambiguous to parse. The people > who have been answering your questions on this list have been working on > this problem for as long as I can remember. And they have spent many years > developing libraries that make dealing with many of the corner cases of > address parsing easier. They started these libraries because the quick and > simple solutions you seem to be hoping for failed to deal with the > realities of addressing. There is not a perfect solution, but what they > are suggesting are the best solutions available. > > If you want to start over and reinvent a simpler solution, go ahead. But > be advised that the people answering you have been down that path. They > know what the next problem will be, and the problem after that, and they > know that at the end of the chain of problems you will discover you need > something like the tools and libraries they are suggesting. > > On Mon, May 11, 2020 at 12:31 PM Paul Ramsey <[email protected]> > wrote: > >> There's a huge range of potential functionality/features that can reside >> under the banner "fuzzy address matching". >> >> At the simplest, you're trying to catch common lexicographic differences, >> like off-by-one addresses or alternate spellings. This is the realm of >> trigrams and levenstein distances. >> Then you start dealing with different forms of abbreviations (rd, road, >> r) and formats (unit 4, #4, apt 4, 4). This is the realm of data-less >> standardizers. >> Then you start dealing with larger forms of aliasing. Standard city and >> state names, recognizing major components of addresses. This is the real of >> dictionary-backed standardizers. >> Then you start dealing with all forms of aliasing. This is actually >> getting down into geocoding, and address standardizers backed by complete >> address and road databases. >> >> What people mean when they say "can you do address standardization" can >> vary massively. It can also be frustrating because the performance of >> something like Google's geocoder algorithm, backed by the largest, and most >> up-to-data geographic database in the world gets casually compared to >> simple format and dictionary backed standardizer by folks with no >> understanding of the complexity or amount of data that lives under the >> covers of this "simple" task. I think people are far more understanding of >> something like machine translation and "get" that it's a really hard >> problem, because learning a new language is hard. But geocoding is "easy", >> anyone can look at an address and then look that address up in a map book >> (ha ha, well anyone over 40 at least). >> >> P. >> >> >> > On May 11, 2020, at 10:15 AM, Imre Samu <[email protected]> wrote: >> > >> > > ... Fuzzy Address Matching? ... >> > > For instance, South Great Avenue, A City, Planet Earth may be put >> down as the following: >> > >> > sometimes .. the same country/city/building - has a different names >> .. >> > >> > "A City" example: >> > https://en.wikipedia.org/wiki/Vienna >> > >> > Vienna: >> > "Beč (Croatian, Serbian, older Bulgarian), Beç (older Turkish)*, Bech >> or Vidnya (Romani), Bécs (Hungarian)*, Bin / Pin - 빈 (Korean), Dunaj >> (Slovene)*, Fienna (Welsh), Vedunia (Celtic), Vena - Вена (Russian), Vídeň >> (Czech)*, Viden' / Videň (Ukrainian)*, Viedeň (Slovak), Viên (Vietnamese), >> Viena / Vijena/ Виена (Belarusian, Bulgarian, Macedonian), Viena (Catalan*, >> Lithuanian, Portuguese*, Romanian*, Spanish*, Tagalog*), Vienna (Italian)*, >> Vienne (French)*, Viénni - Βιέννη (Greek), Vieno (Esperanto), Viin >> (Estonian), Vin - װין (Yiddish), Vín (Irish, Icelandic), Vina - וינה >> (Hebrew), Vínarborg (Icelandic variant), Vindobona (Latin), Vīne >> (Latvian)*, Viyana (Turkish)*, Vjenë (Albanian), Vjenna (Maltese), Vyana >> (Azeri), Wean (local Viennese, Austrian and Bavarian dialects)*, Weiyena - >> 維也納 (Chinese)*, Wene (Afrikaans), Wenen (Dutch)*, Wiedeń (Polish)*, Wien >> (Danish*, Finnish*, German*, Norwegian*, Swedish*), Wīn - ウィーン (Japanese)*, >> Wina (Indonesian), فيينا (Arabic), وين (Persian)" >> > " >> > see: >> https://en.wikipedia.org/wiki/Names_of_European_cities_in_different_languages_(U%E2%80%93Z)#V >> > >> https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages >> > >> > or see the Nominatim names for Vienna: >> > >> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=109166&class=boundary >> > >> > >> > And sometimes - the buildings has a different names ... >> > >> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=11101&class=amenity >> > - Wien Rathaus (name) >> > - Ратуша (name:be) >> > - Vídeňská radnice (name:cs) >> > - Rathaus (name:de) >> > - Vienna City Hall (name:en) >> > - 市庁舎 (name:ja) >> > - Ратуша (name:ru) >> > - Віденська ратуша (name:uk) >> > >> > >> > >> > Best, >> > Imre >> > >> > >> > >> > >> > Shaozhong SHI <[email protected]> ezt írta (időpont: 2020. máj. >> 11., H, 18:24): >> > Hello, >> > >> > I got a few questions as follows: >> > >> > 1. Which one is the best way for Fuzzy Address Matching? >> > >> > 2. FME FuzzyStringComparer uses Python difflib module. Which one in >> Postgres is equivalent or similar to it? >> > >> > 3. Often, addresses collected by different people may well be >> correct. But, there may be typing errors, or addresses are composed not in >> a consistent manner. >> > >> > For instance, South Great Avenue, A City, Planet Earth may be put down >> as the following: >> > >> > S. Great Aveue, City A, Earth Planet >> > Great Avene South, A City, Earth Planet >> > Great Avenue S, A City, Planet Earth >> > >> > Surely, there would be solutions to deal with this problem. >> > >> > Can anyone enlighten me? >> > >> > Regards, >> > >> > Shao >> > _______________________________________________ >> > postgis-users mailing list >> > [email protected] >> > https://lists.osgeo.org/mailman/listinfo/postgis-users >> > _______________________________________________ >> > postgis-users mailing list >> > [email protected] >> > https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
