Or use libpostal as Komяpa suggested and I’m sure there are others also. I’m just familiar with my own code and the fact that I built it to work inside a postgresql database.
Sent from my iPhone > On Jan 9, 2021, at 10:00 AM, Stephen Woodbridge > <[email protected]> wrote: > > David, > > Yup and this is just one a dozens of cases that you have to deal with. You > are dealing with a natural language processing problem. And you have to deal > with human input that has typos and abbreviations. > > These issues are what the address standardizer fixes. It tokenized the > address and uses the gazette to standardize the terms and then classifies > each term and assigns it to part of the address based on a grammar. > > So there is a simple solution, use my address standardizer, it is free, MIT > license, it has a sample lexicon/ gazette and grammar for the UK, it is easy > to modify these to fit your needs, and it just works. Oh if you want to do > another county it also has sample files for 25 countries. > > Sent from my iPhone > >>> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski <[email protected]> >>> wrote: >>> >> >> Hello, >> >> People make neural networks for this kind of task: >> >> https://github.com/openvenues/libpostal >> >> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI <[email protected]> >> напісаў: >>> Hi, Steve W, >>> >>> it is easy to parse addresses as tokens. But it is difficult to put tokens >>> in right columns, due to that the same address could be expressed with >>> partial address or full address. >>> >>> The same address can be written like, Flat 1 122 Great Avenue London UK, >>> or Flat 1 122 Greet Avenue Central London London United Kingdom. >>> >>> When this happens, each address has different number of tokens, so >>> different numbers of tokens. Is there a way to deal with this issue so >>> that each token can get into right column? >>> >>> Please enlighten me. >>> >>> Regards, >>> >>> David >>> >>>> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge >>>> <[email protected]> wrote: >>>> And I have create an address-standardizer project here >>>> https://github.com/woodbri/address-standardizer which is user >>>> configurable. I might be over kill is you just want to strip off the >>>> number, in which case you might just use a SQL regexp replace to remove it. >>>> >>>> -Steve W >>>> >>>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote: >>>> > PostGIS has address_standardizer extension that includes >>>> > parse_address() and standardize_address() functions. >>>> > >>>> > -Steve W >>>> > >>>> > On 4/24/2020 9:54 PM, Imre Samu wrote: >>>> >> > handle addresses in postgresql >>>> >> >>>> >> maybe you can use the https://github.com/openvenues/libpostal library >>>> >> with your favorite language bindings ( Python / Ruby / Go / PHP / >>>> >> Node / R / Java ...) >>>> >> >>>> >> or as a Postgres database extension: >>>> >> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal >>>> >> >>>> >> >>>> >> https://github.com/pramsey/pgsql-postal >>>> >> >>>> >> Regards, >>>> >> Imre >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> Shaozhong SHI <[email protected] >>>> >> <mailto:[email protected]>> ezt írta (időpont: 2020. ápr. 25., >>>> >> Szo, 2:49): >>>> >> >>>> >> I find this is a simple, but important question. >>>> >> >>>> >> How best to split numbers and the rest of address? >>>> >> >>>> >> For instance, one tricky one is as follows: >>>> >> >>>> >> 21-1 Great Avenue, a city, a country, this planet >>>> >> >>>> >> How to turn this into the following: >>>> >> >>>> >> column 1, column 2 >>>> >> >>>> >> 21-1 Great Avenue, a city, a country, this planet >>>> >> >>>> >> Note: there is a hyphen in 21-1 >>>> >> >>>> >> Any clue? >>>> >> >>>> >> Regards, >>>> >> >>>> >> Shao >>>> >> _______________________________________________ >>>> >> postgis-users mailing list >>>> >> [email protected] <mailto:[email protected]> >>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> postgis-users mailing list >>>> >> [email protected] >>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>>> > >>>> >>>> _______________________________________________ >>>> postgis-users mailing list >>>> [email protected] >>>> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
