There's a huge range of potential functionality/features that can reside under 
the banner "fuzzy address matching". 

At the simplest, you're trying to catch common lexicographic differences, like 
off-by-one addresses or alternate spellings. This is the realm of trigrams and 
levenstein distances.
Then you start dealing with different forms of abbreviations (rd, road, r) and 
formats (unit 4, #4, apt 4, 4). This is the realm of data-less standardizers.
Then you start dealing with larger forms of aliasing. Standard city and state 
names, recognizing major components of addresses. This is the real of 
dictionary-backed standardizers.
Then you start dealing with all forms of aliasing. This is actually getting 
down into geocoding, and address standardizers backed by complete address and 
road databases.

What people mean when they say "can you do address standardization" can vary 
massively. It can also be frustrating because the performance of something like 
Google's geocoder algorithm, backed by the largest, and most up-to-data 
geographic database in the world gets casually compared to simple format and 
dictionary backed standardizer by folks with no understanding of the complexity 
or amount of data that lives under the covers of this "simple" task. I think 
people are far more understanding of something like machine translation and 
"get" that it's a really hard problem, because learning a new language is hard. 
But geocoding is "easy", anyone can look at an address and then look that 
address up in a map book (ha ha, well anyone over 40 at least).

P.


> On May 11, 2020, at 10:15 AM, Imre Samu <[email protected]> wrote:
> 
> > ... Fuzzy Address Matching? ... 
> > For instance, South Great Avenue, A City, Planet Earth may be put down as 
> > the following:
> 
> sometimes .. the same  country/city/building   -  has a different names .. 
> 
> "A City"  example:
> https://en.wikipedia.org/wiki/Vienna
> 
> Vienna:
> "Beč (Croatian, Serbian, older Bulgarian), Beç (older Turkish)*, Bech or 
> Vidnya (Romani), Bécs (Hungarian)*, Bin / Pin - 빈 (Korean), Dunaj (Slovene)*, 
> Fienna (Welsh), Vedunia (Celtic), Vena - Вена (Russian), Vídeň (Czech)*, 
> Viden' / Videň (Ukrainian)*, Viedeň (Slovak), Viên (Vietnamese), Viena / 
> Vijena/ Виена (Belarusian, Bulgarian, Macedonian), Viena (Catalan*, 
> Lithuanian, Portuguese*, Romanian*, Spanish*, Tagalog*), Vienna (Italian)*, 
> Vienne (French)*, Viénni - Βιέννη (Greek), Vieno (Esperanto), Viin 
> (Estonian), Vin - װין (Yiddish), Vín (Irish, Icelandic), Vina - וינה 
> (Hebrew), Vínarborg (Icelandic variant), Vindobona (Latin), Vīne (Latvian)*, 
> Viyana (Turkish)*, Vjenë (Albanian), Vjenna (Maltese), Vyana (Azeri), Wean 
> (local Viennese, Austrian and Bavarian dialects)*, Weiyena - 維也納 (Chinese)*, 
> Wene (Afrikaans), Wenen (Dutch)*, Wiedeń (Polish)*, Wien (Danish*, Finnish*, 
> German*, Norwegian*, Swedish*), Wīn - ウィーン (Japanese)*, Wina (Indonesian), 
> فيينا (Arabic), وين (Persian)"
> "
> see: 
> https://en.wikipedia.org/wiki/Names_of_European_cities_in_different_languages_(U%E2%80%93Z)#V
> https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages
> 
> or see the Nominatim names for Vienna:
> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=109166&class=boundary
> 
> 
> And sometimes - the buildings has a different names ...
> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=11101&class=amenity
> - Wien Rathaus (name)
> - Ратуша (name:be)
> - Vídeňská radnice (name:cs)
> - Rathaus (name:de)
> - Vienna City Hall (name:en)
> - 市庁舎 (name:ja)
> - Ратуша (name:ru)
> - Віденська ратуша (name:uk)
> 
> 
> 
> Best,
>  Imre
> 
> 
> 
> 
> Shaozhong SHI <[email protected]> ezt írta (időpont: 2020. máj. 11., H, 
> 18:24):
> Hello,
> 
> I got a few questions as follows:
> 
> 1.  Which one is the best way for Fuzzy Address Matching?
> 
> 2.  FME FuzzyStringComparer uses  Python difflib module.  Which one in 
> Postgres is equivalent or similar to it?
> 
> 3.  Often, addresses collected by different people may well be correct.  But, 
> there may be typing errors, or addresses are composed not in a consistent 
> manner.
> 
> For instance, South Great Avenue, A City, Planet Earth may be put down as the 
> following:
> 
> S. Great Aveue, City A, Earth Planet
> Great Avene South, A City, Earth Planet
> Great Avenue S, A City, Planet Earth
> 
> Surely, there would be solutions to deal with this problem.
> 
> Can anyone enlighten me?
> 
> Regards,
> 
> Shao
> _______________________________________________
> postgis-users mailing list
> [email protected]
> https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> [email protected]
> https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to