Sorry. It is a very interesting topic, so far as our field is concerned. Regards,
Shao On Tue, 12 May 2020 at 03:55, Shaozhong SHI <[email protected]> wrote: > Dear All Friends, > > Thank you very much for all the information. > > I will spend time to digest and think over all these. > > It is certainly a very interest topic. > > Regards, > > Shao > > On Mon, 11 May 2020 at 18:34, James Klassen <[email protected]> wrote: > >> There are lots of classes of tricky cases: >> >> For example: >> >> 1234 County Road E, Hudson >> >> Is actually "County Road E", not "County Road East". The county roads >> there are indexed by letters. There is also a County Road F, and a County >> Road N, and a County Road FF, etc.) Last I checked, even Google, with all >> their data, gets this one wrong. >> >> In general, address parsing is a hard problem and there are always (at >> least in the US) going to be cases that are ambiguous to parse. The people >> who have been answering your questions on this list have been working on >> this problem for as long as I can remember. And they have spent many years >> developing libraries that make dealing with many of the corner cases of >> address parsing easier. They started these libraries because the quick and >> simple solutions you seem to be hoping for failed to deal with the >> realities of addressing. There is not a perfect solution, but what they >> are suggesting are the best solutions available. >> >> If you want to start over and reinvent a simpler solution, go ahead. But >> be advised that the people answering you have been down that path. They >> know what the next problem will be, and the problem after that, and they >> know that at the end of the chain of problems you will discover you need >> something like the tools and libraries they are suggesting. >> >> On Mon, May 11, 2020 at 12:31 PM Paul Ramsey <[email protected]> >> wrote: >> >>> There's a huge range of potential functionality/features that can reside >>> under the banner "fuzzy address matching". >>> >>> At the simplest, you're trying to catch common lexicographic >>> differences, like off-by-one addresses or alternate spellings. This is the >>> realm of trigrams and levenstein distances. >>> Then you start dealing with different forms of abbreviations (rd, road, >>> r) and formats (unit 4, #4, apt 4, 4). This is the realm of data-less >>> standardizers. >>> Then you start dealing with larger forms of aliasing. Standard city and >>> state names, recognizing major components of addresses. This is the real of >>> dictionary-backed standardizers. >>> Then you start dealing with all forms of aliasing. This is actually >>> getting down into geocoding, and address standardizers backed by complete >>> address and road databases. >>> >>> What people mean when they say "can you do address standardization" can >>> vary massively. It can also be frustrating because the performance of >>> something like Google's geocoder algorithm, backed by the largest, and most >>> up-to-data geographic database in the world gets casually compared to >>> simple format and dictionary backed standardizer by folks with no >>> understanding of the complexity or amount of data that lives under the >>> covers of this "simple" task. I think people are far more understanding of >>> something like machine translation and "get" that it's a really hard >>> problem, because learning a new language is hard. But geocoding is "easy", >>> anyone can look at an address and then look that address up in a map book >>> (ha ha, well anyone over 40 at least). >>> >>> P. >>> >>> >>> > On May 11, 2020, at 10:15 AM, Imre Samu <[email protected]> wrote: >>> > >>> > > ... Fuzzy Address Matching? ... >>> > > For instance, South Great Avenue, A City, Planet Earth may be put >>> down as the following: >>> > >>> > sometimes .. the same country/city/building - has a different >>> names .. >>> > >>> > "A City" example: >>> > https://en.wikipedia.org/wiki/Vienna >>> > >>> > Vienna: >>> > "Beč (Croatian, Serbian, older Bulgarian), Beç (older Turkish)*, Bech >>> or Vidnya (Romani), Bécs (Hungarian)*, Bin / Pin - 빈 (Korean), Dunaj >>> (Slovene)*, Fienna (Welsh), Vedunia (Celtic), Vena - Вена (Russian), Vídeň >>> (Czech)*, Viden' / Videň (Ukrainian)*, Viedeň (Slovak), Viên (Vietnamese), >>> Viena / Vijena/ Виена (Belarusian, Bulgarian, Macedonian), Viena (Catalan*, >>> Lithuanian, Portuguese*, Romanian*, Spanish*, Tagalog*), Vienna (Italian)*, >>> Vienne (French)*, Viénni - Βιέννη (Greek), Vieno (Esperanto), Viin >>> (Estonian), Vin - װין (Yiddish), Vín (Irish, Icelandic), Vina - וינה >>> (Hebrew), Vínarborg (Icelandic variant), Vindobona (Latin), Vīne >>> (Latvian)*, Viyana (Turkish)*, Vjenë (Albanian), Vjenna (Maltese), Vyana >>> (Azeri), Wean (local Viennese, Austrian and Bavarian dialects)*, Weiyena - >>> 維也納 (Chinese)*, Wene (Afrikaans), Wenen (Dutch)*, Wiedeń (Polish)*, Wien >>> (Danish*, Finnish*, German*, Norwegian*, Swedish*), Wīn - ウィーン (Japanese)*, >>> Wina (Indonesian), فيينا (Arabic), وين (Persian)" >>> > " >>> > see: >>> https://en.wikipedia.org/wiki/Names_of_European_cities_in_different_languages_(U%E2%80%93Z)#V >>> > >>> https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages >>> > >>> > or see the Nominatim names for Vienna: >>> > >>> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=109166&class=boundary >>> > >>> > >>> > And sometimes - the buildings has a different names ... >>> > >>> https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=11101&class=amenity >>> > - Wien Rathaus (name) >>> > - Ратуша (name:be) >>> > - Vídeňská radnice (name:cs) >>> > - Rathaus (name:de) >>> > - Vienna City Hall (name:en) >>> > - 市庁舎 (name:ja) >>> > - Ратуша (name:ru) >>> > - Віденська ратуша (name:uk) >>> > >>> > >>> > >>> > Best, >>> > Imre >>> > >>> > >>> > >>> > >>> > Shaozhong SHI <[email protected]> ezt írta (időpont: 2020. máj. >>> 11., H, 18:24): >>> > Hello, >>> > >>> > I got a few questions as follows: >>> > >>> > 1. Which one is the best way for Fuzzy Address Matching? >>> > >>> > 2. FME FuzzyStringComparer uses Python difflib module. Which one in >>> Postgres is equivalent or similar to it? >>> > >>> > 3. Often, addresses collected by different people may well be >>> correct. But, there may be typing errors, or addresses are composed not in >>> a consistent manner. >>> > >>> > For instance, South Great Avenue, A City, Planet Earth may be put down >>> as the following: >>> > >>> > S. Great Aveue, City A, Earth Planet >>> > Great Avene South, A City, Earth Planet >>> > Great Avenue S, A City, Planet Earth >>> > >>> > Surely, there would be solutions to deal with this problem. >>> > >>> > Can anyone enlighten me? >>> > >>> > Regards, >>> > >>> > Shao >>> > _______________________________________________ >>> > postgis-users mailing list >>> > [email protected] >>> > https://lists.osgeo.org/mailman/listinfo/postgis-users >>> > _______________________________________________ >>> > postgis-users mailing list >>> > [email protected] >>> > https://lists.osgeo.org/mailman/listinfo/postgis-users >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
