Or use libpostal as Komяpa suggested and I’m sure there are others also. I’m 
just familiar with my own code and the fact that I built it to work inside a 
postgresql database. 

Sent from my iPhone

> On Jan 9, 2021, at 10:00 AM, Stephen Woodbridge 
> <[email protected]> wrote:
> 
> David,
> 
> Yup and this is just one a dozens of cases that you have to deal with. You 
> are dealing with a natural language processing problem. And you have to deal 
> with human input that has typos and abbreviations. 
> 
> These issues are what the address standardizer fixes. It tokenized the 
> address and uses the gazette to standardize the terms and then classifies 
> each term and assigns it to part of the address based on a grammar. 
> 
> So there is a simple solution, use my address standardizer, it is free, MIT 
> license, it has a sample lexicon/ gazette and grammar for the UK, it is easy 
> to modify these to fit your needs, and it just works. Oh if you want to do 
> another county it also has sample files for 25 countries.
> 
> Sent from my iPhone
> 
>>> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski <[email protected]> 
>>> wrote:
>>> 
>> 
>> Hello,
>> 
>> People make neural networks for this kind of task:
>> 
>> https://github.com/openvenues/libpostal
>> 
>> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI <[email protected]> 
>> напісаў:
>>> Hi, Steve W,
>>> 
>>> it is easy to parse addresses as tokens.  But it is difficult to put tokens 
>>> in right columns, due to that the same address could be expressed with 
>>> partial address or full address.
>>> 
>>> The same address can be written like,  Flat 1 122 Great Avenue London UK, 
>>> or Flat 1 122 Greet Avenue Central London London United Kingdom.
>>> 
>>> When this happens, each address has different number of tokens, so 
>>> different numbers of tokens.  Is there a way to deal with this issue so 
>>> that each token can get into right column?
>>> 
>>> Please enlighten me.
>>> 
>>> Regards,
>>> 
>>> David
>>> 
>>>> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge 
>>>> <[email protected]> wrote:
>>>> And I have create an address-standardizer project here 
>>>> https://github.com/woodbri/address-standardizer which is user 
>>>> configurable. I might be over kill is you just want to strip off the 
>>>> number, in which case you might just use a SQL regexp replace to remove it.
>>>> 
>>>> -Steve W
>>>> 
>>>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
>>>> > PostGIS has address_standardizer extension that includes 
>>>> > parse_address() and standardize_address() functions.
>>>> >
>>>> > -Steve W
>>>> >
>>>> > On 4/24/2020 9:54 PM, Imre Samu wrote:
>>>> >> > handle addresses in postgresql
>>>> >>
>>>> >> maybe you can use the https://github.com/openvenues/libpostal library
>>>> >> with your favorite language bindings ( Python / Ruby / Go / PHP / 
>>>> >> Node / R / Java  ...)
>>>> >>
>>>> >> or as a Postgres database extension:
>>>> >> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
>>>> >>  
>>>> >>
>>>> >> https://github.com/pramsey/pgsql-postal
>>>> >>
>>>> >> Regards,
>>>> >>  Imre
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> Shaozhong SHI <[email protected] 
>>>> >> <mailto:[email protected]>> ezt írta (időpont: 2020. ápr. 25., 
>>>> >> Szo, 2:49):
>>>> >>
>>>> >>     I find this is a simple, but important question.
>>>> >>
>>>> >>     How best to split numbers and the rest of address?
>>>> >>
>>>> >>     For instance, one tricky one is as follows:
>>>> >>
>>>> >>     21-1 Great Avenue, a city, a country, this planet
>>>> >>
>>>> >>     How to turn this into the following:
>>>> >>
>>>> >>     column 1,       column 2
>>>> >>
>>>> >>       21-1              Great Avenue, a city, a country, this planet
>>>> >>
>>>> >>     Note:  there is a hyphen in  21-1
>>>> >>
>>>> >>     Any clue?
>>>> >>
>>>> >>     Regards,
>>>> >>
>>>> >>     Shao
>>>> >>     _______________________________________________
>>>> >>     postgis-users mailing list
>>>> >>     [email protected] <mailto:[email protected]>
>>>> >>     https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> postgis-users mailing list
>>>> >> [email protected]
>>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>> >
>>>> 
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> [email protected]
>>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> _______________________________________________
>>> postgis-users mailing list
>>> [email protected]
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>> _______________________________________________
>> postgis-users mailing list
>> [email protected]
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to