Hi, Steve, Another solution appeals me most is as follows:
Given a space delimited full address line, we can parse it to correct BS7666 format. Something like house number, street, area, city, postcode E.g., Nest 2 my nesting place 1B Great Avenue Forest Park London WS22 5TT Can you enlighten me about that? Regards , David On Saturday, 9 January 2021, Stephen Woodbridge < [email protected]> wrote: > David, > > This is the link to the address standardizer: > https://github.com/woodbri/address-standardizer > > This is a link to all my code that I developed consulting. It includes a > few SQL geocoders based on the code above. And has some README files > discussing how to build a geocoder which is the basis for how the geocoders > work. > > https://github.com/woodbri/imaptools.com > > this is the geocoder for Tiger data, but the code is essentially the same > for every country because the when you load country specific data into the > database it goes into its own table and then you standardize that data into > stdstreets table and all queries are done against the stdstreets table and > you only have to tweak the address range interpolation function which needs > to access the source streets table for the geometry and house number ranges. > > https://github.com/woodbri/imaptools.com/blob/master/sql-scr > ipts/geocoder/prep-tiger-geo-new.sql > > I would approach this by: > > 1. get the address standardizer compiled and installed. I can help if you > run into problems or have questions. > 2. load your UK street data into rawdata schema, ideally it would be best > if we can create a table/view that presents this data as a single table > where each record represents one side of the street and one jurisdiction > this may mean that a single record in your source data will generate > multiple records in this table/view (this greatly simplifies the coding and > performance later) > 3. look at the prep-tiger-geo-new.sql file > 4. create a stdstreets table and standardize your table/view data into it > 5. look at standardization failures and adjust lexicon and grammar as > needed > 6. loop back to 4 until good enough > 7. load functions from prep-tiger-geo-new.sql file and adjust any for your > data > 8. try it out! > > -Steve > > > On 1/9/2021 10:22 AM, Shaozhong SHI wrote: > >> Hi, Stephen, >> >> Many thanks. We are interested in it is working with the UK addresses. >> >> Please send me the link to this. >> >> Regards, >> >> David >> >> On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge < >> [email protected] <mailto:[email protected]>> >> wrote: >> >> David, >> >> Yup and this is just one a dozens of cases that you have to deal >> with. You are dealing with a natural language processing problem. >> And you have to deal with human input that has typos and >> abbreviations. >> >> These issues are what the address standardizer fixes. It tokenized >> the address and uses the gazette to standardize the terms and then >> classifies each term and assigns it to part of the address based >> on a grammar. >> >> So there is a simple solution, use my address standardizer, it is >> free, MIT license, it has a sample lexicon/ gazette and grammar >> for the UK, it is easy to modify these to fit your needs, and it >> just works. Oh if you want to do another county it also has sample >> files for 25 countries. >> >> Sent from my iPhone >> >> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> >>> Hello, >>> >>> People make neural networks for this kind of task: >>> >>> https://github.com/openvenues/libpostal >>> <https://github.com/openvenues/libpostal> >>> >>> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI >>> <[email protected] <mailto:[email protected]>> напісаў: >>> >>> Hi, Steve W, >>> >>> it is easy to parse addresses as tokens. But it is difficult >>> to put tokens in right columns, due to that the same address >>> could be expressed with partial address or full address. >>> >>> The same address can be written like, Flat 1 122 Great Avenue >>> London UK, or Flat 1 122 Greet Avenue Central London London >>> United Kingdom. >>> >>> When this happens, each address has different number of >>> tokens, so different numbers of tokens. Is there a way to >>> deal with this issue so that each token can get into right >>> column? >>> >>> Please enlighten me. >>> >>> Regards, >>> >>> David >>> >>> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge >>> <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> And I have create an address-standardizer project here >>> https://github.com/woodbri/address-standardizer >>> <https://github.com/woodbri/address-standardizer> which >>> is user >>> configurable. I might be over kill is you just want to >>> strip off the >>> number, in which case you might just use a SQL regexp >>> replace to remove it. >>> >>> -Steve W >>> >>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote: >>> > PostGIS has address_standardizer extension that includes >>> > parse_address() and standardize_address() functions. >>> > >>> > -Steve W >>> > >>> > On 4/24/2020 9:54 PM, Imre Samu wrote: >>> >> > handle addresses in postgresql >>> >> >>> >> maybe you can use the >>> https://github.com/openvenues/libpostal >>> <https://github.com/openvenues/libpostal> library >>> >> with your favorite language bindings ( Python / Ruby / >>> Go / PHP / >>> >> Node / R / Java ...) >>> >> >>> >> or as a Postgres database extension: >>> >> >>> https://info.crunchydata.com/blog/quick-and-dirty-address-ma >>> tching-with-libpostal >>> <https://info.crunchydata.com/blog/quick-and-dirty-address-m >>> atching-with-libpostal> >>> >>> >> >>> >> https://github.com/pramsey/pgsql-postal >>> <https://github.com/pramsey/pgsql-postal> >>> >> >>> >> Regards, >>> >> Imre >>> >> >>> >> >>> >> >>> >> >>> >> Shaozhong SHI <[email protected] >>> <mailto:[email protected]> >>> >> <mailto:[email protected] >>> <mailto:[email protected]>>> ezt írta (időpont: >>> 2020. ápr. 25., >>> >> Szo, 2:49): >>> >> >>> >> I find this is a simple, but important question. >>> >> >>> >> How best to split numbers and the rest of address? >>> >> >>> >> For instance, one tricky one is as follows: >>> >> >>> >> 21-1 Great Avenue, a city, a country, this planet >>> >> >>> >> How to turn this into the following: >>> >> >>> >> column 1, column 2 >>> >> >>> >> 21-1 Great Avenue, a city, a >>> country, this planet >>> >> >>> >> Note: there is a hyphen in 21-1 >>> >> >>> >> Any clue? >>> >> >>> >> Regards, >>> >> >>> >> Shao >>> >> _______________________________________________ >>> >> postgis-users mailing list >>> >> [email protected] >>> <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>> >>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >>> >> >>> >> >>> >> _______________________________________________ >>> >> postgis-users mailing list >>> >> [email protected] >>> <mailto:[email protected]> >>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >>> > >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> <mailto:[email protected]> >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> <mailto:[email protected]> >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] <mailto:[email protected]> >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >>> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >>> >> _______________________________________________ >> postgis-users mailing list >> [email protected] <mailto:[email protected]> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >> >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
