Hi, Steve, Thanks.
2 questions. 1. How can we remove things like Room 2a, Buildings 2-6b and etc with regexp replace? 2. Once extensions created, can these functions be adapted? Are codes available ? I will see whether to put it into a project, so that our programmers can have work to do. Regards, Shao On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge < [email protected]> wrote: > On 4/25/2020 7:19 PM, Shaozhong SHI wrote: > > Hi, Steve, > > > > Many thanks. Please send me the link to parse_address() and > > standardize_address(). > > If you already have postGIS installed then > > create extension address_standardizer; > > # \df parse_address > List of functions > Schema | Name | Result data type | Argument data types | Type > > --------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+-------- > public | parse_address | record | text, OUT num text, OUT > street text, OUT street2 text, OUT address1 text, OUT city text, OUT > state text, OUT zip text, OUT zipplus text, OUT country text | normal > (1 row) > > # select * from parse_address('123-2 main street city ny'); > num | street | street2 | address1 | city | state | zip > | zipplus | country > > -------+-------------+---------+-------------------+------+-------+-----+---------+--------- > 123-2 | main street | | 123-2 main street | city | NY | > | | US > (1 row) > > # \df standardize* > List of functions > Schema | Name | Result data type | > Argument data types | Type > > --------+---------------------+------------------+---------------------------------------------------------------+-------- > public | standardize_address | stdaddr | lextab text, gaztab > text, rultab text, address text | normal > public | standardize_address | stdaddr | lextab text, gaztab > text, rultab text, micro text, macro text | normal > > You need tables for the lexicon, gazetteer, and rules, which should be > included in the extension but I'm not seeing them. So you can grab these > from: > > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql > > and load them like: > > psql mydb -f us-gaz.sql > psql mydb -f us-lex.sql > psql mydb -f us-rules.sql > > # select * from standardize_address('lex', 'gaz', 'rules', '123-2 main > street city ny'); > building | house_num | predir | qual | pretype | name | suftype | > sufdir | ruralroute | extra | city | state | country | postcode | box > | unit > > ----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------ > | 123 | | | | 2 MAIN | STREET > | | | | CITY | NEW YORK | USA | | | > (1 row) > > > This is a good example of why parsing addresses is so difficult. The > rules for standardize_address do not account for a house number like > "123-2", but the regexp in parse_address do handle it. It is easy to get > the 80% right and very hard to get it much above that. > > -Steve > > > > > > I need to find these first before test-running. > > > > Regards, > > > > Shao > > > > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge > > <[email protected] <mailto:[email protected]>> > > wrote: > > > > Shao, > > > > '^( *Building *[0-9]+)?[- 0-9]*' > > > > or something like that should do it. But I think you will find that a > > more robust solution is to use parse_address() and/or > > standardize_address() as they will recognize a lot of other address > > constructs, like "apt 3a" for for example. > > > > parse_address() that a text field and breaks it into "house number > > street name" and "city state zip", but only works well in North > > America. > > > > standardize_address() that comes with postGIS, breaks the address > > down > > into its components and can separate out things like buildings, and > > apartment/unit specifiers so you can then take the fields you are > > interested in and recombine just them in a new string. Again, this > > works > > best in North America. > > > > My github address-standardizer is built to recognize address for most > > counties, but it can also be configured to recognize address > > standards > > for any county without too much effort. It compiles and installs as > > postgresql extension. > > > > Addresses are generally very messy and unless your addresses are vary > > simple you will be constantly fighting with this or that exception. > > > > -Steve > > > > On 4/25/2020 2:55 PM, Shaozhong SHI wrote: > > > Is there a way to left trim including the building and number? > > > > > > Building 3 21-1 Great Avenue, a city, a country, this > > > planet. > > > > > > How to take way those things which are too local to an address? > > > > > > Regards, > > > > > > Shao > > > > > > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > > > > I find this is a simple, but important question. > > > > > > How best to split numbers and the rest of address? > > > > > > For instance, one tricky one is as follows: > > > > > > 21-1 Great Avenue, a city, a country, this planet > > > > > > How to turn this into the following: > > > > > > column 1, column 2 > > > > > > 21-1 Great Avenue, a city, a country, this > planet > > > > > > Note: there is a hyphen in 21-1 > > > > > > Any clue? > > > > > > Regards, > > > > > > Shao > > > > > > > > > _______________________________________________ > > > postgis-users mailing list > > > [email protected] <mailto: > [email protected]> > > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > _______________________________________________ > > postgis-users mailing list > > [email protected] <mailto:[email protected]> > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > > > _______________________________________________ > > postgis-users mailing list > > [email protected] > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
