Hi, Steve, Many thanks.
We are strongly interested in dealing with UK addresses. Last time, I passed on messages for testing. I was told not working for UK addresses. I will try it again. Anything else do we have to watch out for? Regards, David On Sat, 9 Jan 2021 at 15:53, Stephen Woodbridge < [email protected]> wrote: > David, > > This is the link to the address standardizer: > https://github.com/woodbri/address-standardizer > > This is a link to all my code that I developed consulting. It includes a > few SQL geocoders based on the code above. And has some README files > discussing how to build a geocoder which is the basis for how the > geocoders work. > > https://github.com/woodbri/imaptools.com > > this is the geocoder for Tiger data, but the code is essentially the > same for every country because the when you load country specific data > into the database it goes into its own table and then you standardize > that data into stdstreets table and all queries are done against the > stdstreets table and you only have to tweak the address range > interpolation function which needs to access the source streets table > for the geometry and house number ranges. > > > https://github.com/woodbri/imaptools.com/blob/master/sql-scripts/geocoder/prep-tiger-geo-new.sql > > I would approach this by: > > 1. get the address standardizer compiled and installed. I can help if > you run into problems or have questions. > 2. load your UK street data into rawdata schema, ideally it would be > best if we can create a table/view that presents this data as a single > table where each record represents one side of the street and one > jurisdiction this may mean that a single record in your source data will > generate multiple records in this table/view (this greatly simplifies > the coding and performance later) > 3. look at the prep-tiger-geo-new.sql file > 4. create a stdstreets table and standardize your table/view data into it > 5. look at standardization failures and adjust lexicon and grammar as > needed > 6. loop back to 4 until good enough > 7. load functions from prep-tiger-geo-new.sql file and adjust any for > your data > 8. try it out! > > -Steve > > > On 1/9/2021 10:22 AM, Shaozhong SHI wrote: > > Hi, Stephen, > > > > Many thanks. We are interested in it is working with the UK addresses. > > > > Please send me the link to this. > > > > Regards, > > > > David > > > > On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge > > <[email protected] <mailto:[email protected]>> > > wrote: > > > > David, > > > > Yup and this is just one a dozens of cases that you have to deal > > with. You are dealing with a natural language processing problem. > > And you have to deal with human input that has typos and > > abbreviations. > > > > These issues are what the address standardizer fixes. It tokenized > > the address and uses the gazette to standardize the terms and then > > classifies each term and assigns it to part of the address based > > on a grammar. > > > > So there is a simple solution, use my address standardizer, it is > > free, MIT license, it has a sample lexicon/ gazette and grammar > > for the UK, it is easy to modify these to fit your needs, and it > > just works. Oh if you want to do another county it also has sample > > files for 25 countries. > > > > Sent from my iPhone > > > >> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> > >> Hello, > >> > >> People make neural networks for this kind of task: > >> > >> https://github.com/openvenues/libpostal > >> <https://github.com/openvenues/libpostal> > >> > >> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI > >> <[email protected] <mailto:[email protected]>> напісаў: > >> > >> Hi, Steve W, > >> > >> it is easy to parse addresses as tokens. But it is difficult > >> to put tokens in right columns, due to that the same address > >> could be expressed with partial address or full address. > >> > >> The same address can be written like, Flat 1 122 Great Avenue > >> London UK, or Flat 1 122 Greet Avenue Central London London > >> United Kingdom. > >> > >> When this happens, each address has different number of > >> tokens, so different numbers of tokens. Is there a way to > >> deal with this issue so that each token can get into right > >> column? > >> > >> Please enlighten me. > >> > >> Regards, > >> > >> David > >> > >> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge > >> <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> And I have create an address-standardizer project here > >> https://github.com/woodbri/address-standardizer > >> <https://github.com/woodbri/address-standardizer> which > >> is user > >> configurable. I might be over kill is you just want to > >> strip off the > >> number, in which case you might just use a SQL regexp > >> replace to remove it. > >> > >> -Steve W > >> > >> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote: > >> > PostGIS has address_standardizer extension that includes > >> > parse_address() and standardize_address() functions. > >> > > >> > -Steve W > >> > > >> > On 4/24/2020 9:54 PM, Imre Samu wrote: > >> >> > handle addresses in postgresql > >> >> > >> >> maybe you can use the > >> https://github.com/openvenues/libpostal > >> <https://github.com/openvenues/libpostal> library > >> >> with your favorite language bindings ( Python / Ruby / > >> Go / PHP / > >> >> Node / R / Java ...) > >> >> > >> >> or as a Postgres database extension: > >> >> > >> > https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal > >> < > https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal > > > >> > >> >> > >> >> https://github.com/pramsey/pgsql-postal > >> <https://github.com/pramsey/pgsql-postal> > >> >> > >> >> Regards, > >> >> Imre > >> >> > >> >> > >> >> > >> >> > >> >> Shaozhong SHI <[email protected] > >> <mailto:[email protected]> > >> >> <mailto:[email protected] > >> <mailto:[email protected]>>> ezt írta (időpont: > >> 2020. ápr. 25., > >> >> Szo, 2:49): > >> >> > >> >> I find this is a simple, but important question. > >> >> > >> >> How best to split numbers and the rest of address? > >> >> > >> >> For instance, one tricky one is as follows: > >> >> > >> >> 21-1 Great Avenue, a city, a country, this planet > >> >> > >> >> How to turn this into the following: > >> >> > >> >> column 1, column 2 > >> >> > >> >> 21-1 Great Avenue, a city, a > >> country, this planet > >> >> > >> >> Note: there is a hyphen in 21-1 > >> >> > >> >> Any clue? > >> >> > >> >> Regards, > >> >> > >> >> Shao > >> >> _______________________________________________ > >> >> postgis-users mailing list > >> >> [email protected] > >> <mailto:[email protected]> > >> <mailto:[email protected] > >> <mailto:[email protected]>> > >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> > >> >> > >> >> > >> >> _______________________________________________ > >> >> postgis-users mailing list > >> >> [email protected] > >> <mailto:[email protected]> > >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> > >> > > >> > >> _______________________________________________ > >> postgis-users mailing list > >> [email protected] > >> <mailto:[email protected]> > >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> > >> > >> _______________________________________________ > >> postgis-users mailing list > >> [email protected] > >> <mailto:[email protected]> > >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> > >> > >> _______________________________________________ > >> postgis-users mailing list > >> [email protected] <mailto:[email protected] > > > >> https://lists.osgeo.org/mailman/listinfo/postgis-users > >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> > > _______________________________________________ > > postgis-users mailing list > > [email protected] <mailto:[email protected]> > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > <https://lists.osgeo.org/mailman/listinfo/postgis-users> > > > > > > _______________________________________________ > > postgis-users mailing list > > [email protected] > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
