Hi, Stephen, Please send me the link to libpostal. I also need information on how to instal it one PostGIS. I need the information to instruct our ICT staff, so that they can make it ready.
Regards, David On Sat, 9 Jan 2021 at 15:04, Stephen Woodbridge < [email protected]> wrote: > Or use libpostal as Komяpa suggested and I’m sure there are others also. > I’m just familiar with my own code and the fact that I built it to work > inside a postgresql database. > > Sent from my iPhone > > On Jan 9, 2021, at 10:00 AM, Stephen Woodbridge < > [email protected]> wrote: > > David, > > Yup and this is just one a dozens of cases that you have to deal with. You > are dealing with a natural language processing problem. And you have to > deal with human input that has typos and abbreviations. > > These issues are what the address standardizer fixes. It tokenized the > address and uses the gazette to standardize the terms and then classifies > each term and assigns it to part of the address based on a grammar. > > So there is a simple solution, use my address standardizer, it is free, > MIT license, it has a sample lexicon/ gazette and grammar for the UK, it is > easy to modify these to fit your needs, and it just works. Oh if you want > to do another county it also has sample files for 25 countries. > > Sent from my iPhone > > On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski <[email protected]> > wrote: > > > Hello, > > People make neural networks for this kind of task: > > https://github.com/openvenues/libpostal > > сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI <[email protected]> > напісаў: > >> Hi, Steve W, >> >> it is easy to parse addresses as tokens. But it is difficult to put >> tokens in right columns, due to that the same address could be expressed >> with partial address or full address. >> >> The same address can be written like, Flat 1 122 Great Avenue London UK, >> or Flat 1 122 Greet Avenue Central London London United Kingdom. >> >> When this happens, each address has different number of tokens, so >> different numbers of tokens. Is there a way to deal with this issue so >> that each token can get into right column? >> >> Please enlighten me. >> >> Regards, >> >> David >> >> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge < >> [email protected]> wrote: >> >>> And I have create an address-standardizer project here >>> https://github.com/woodbri/address-standardizer which is user >>> configurable. I might be over kill is you just want to strip off the >>> number, in which case you might just use a SQL regexp replace to remove >>> it. >>> >>> -Steve W >>> >>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote: >>> > PostGIS has address_standardizer extension that includes >>> > parse_address() and standardize_address() functions. >>> > >>> > -Steve W >>> > >>> > On 4/24/2020 9:54 PM, Imre Samu wrote: >>> >> > handle addresses in postgresql >>> >> >>> >> maybe you can use the https://github.com/openvenues/libpostal library >>> >> with your favorite language bindings ( Python / Ruby / Go / PHP / >>> >> Node / R / Java ...) >>> >> >>> >> or as a Postgres database extension: >>> >> >>> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal >>> >> >>> >> https://github.com/pramsey/pgsql-postal >>> >> >>> >> Regards, >>> >> Imre >>> >> >>> >> >>> >> >>> >> >>> >> Shaozhong SHI <[email protected] >>> >> <mailto:[email protected]>> ezt írta (időpont: 2020. ápr. 25., >>> >> Szo, 2:49): >>> >> >>> >> I find this is a simple, but important question. >>> >> >>> >> How best to split numbers and the rest of address? >>> >> >>> >> For instance, one tricky one is as follows: >>> >> >>> >> 21-1 Great Avenue, a city, a country, this planet >>> >> >>> >> How to turn this into the following: >>> >> >>> >> column 1, column 2 >>> >> >>> >> 21-1 Great Avenue, a city, a country, this planet >>> >> >>> >> Note: there is a hyphen in 21-1 >>> >> >>> >> Any clue? >>> >> >>> >> Regards, >>> >> >>> >> Shao >>> >> _______________________________________________ >>> >> postgis-users mailing list >>> >> [email protected] <mailto: >>> [email protected]> >>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> >> >>> >> >>> >> _______________________________________________ >>> >> postgis-users mailing list >>> >> [email protected] >>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> > >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
