On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
Hi, Steve,
Many thanks. Please send me the link to parse_address() and
standardize_address().
If you already have postGIS installed then
create extension address_standardizer;
# \df parse_address
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
public | parse_address | record | text, OUT num text, OUT
street text, OUT street2 text, OUT address1 text, OUT city text, OUT
state text, OUT zip text, OUT zipplus text, OUT country text | normal
(1 row)
# select * from parse_address('123-2 main street city ny');
num | street | street2 | address1 | city | state | zip
| zipplus | country
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
123-2 | main street | | 123-2 main street | city | NY |
| | US
(1 row)
# \df standardize*
List of functions
Schema | Name | Result data type |
Argument data types | Type
--------+---------------------+------------------+---------------------------------------------------------------+--------
public | standardize_address | stdaddr | lextab text, gaztab
text, rultab text, address text | normal
public | standardize_address | stdaddr | lextab text, gaztab
text, rultab text, micro text, macro text | normal
You need tables for the lexicon, gazetteer, and rules, which should be
included in the extension but I'm not seeing them. So you can grab these
from:
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql
and load them like:
psql mydb -f us-gaz.sql
psql mydb -f us-lex.sql
psql mydb -f us-rules.sql
# select * from standardize_address('lex', 'gaz', 'rules', '123-2 main
street city ny');
building | house_num | predir | qual | pretype | name | suftype |
sufdir | ruralroute | extra | city | state | country | postcode | box
| unit
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
| 123 | | | | 2 MAIN | STREET
| | | | CITY | NEW YORK | USA | | |
(1 row)
This is a good example of why parsing addresses is so difficult. The
rules for standardize_address do not account for a house number like
"123-2", but the regexp in parse_address do handle it. It is easy to get
the 80% right and very hard to get it much above that.
-Steve
I need to find these first before test-running.
Regards,
Shao
On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
<[email protected] <mailto:[email protected]>>
wrote:
Shao,
'^( *Building *[0-9]+)?[- 0-9]*'
or something like that should do it. But I think you will find that a
more robust solution is to use parse_address() and/or
standardize_address() as they will recognize a lot of other address
constructs, like "apt 3a" for for example.
parse_address() that a text field and breaks it into "house number
street name" and "city state zip", but only works well in North
America.
standardize_address() that comes with postGIS, breaks the address
down
into its components and can separate out things like buildings, and
apartment/unit specifiers so you can then take the fields you are
interested in and recombine just them in a new string. Again, this
works
best in North America.
My github address-standardizer is built to recognize address for most
counties, but it can also be configured to recognize address
standards
for any county without too much effort. It compiles and installs as
postgresql extension.
Addresses are generally very messy and unless your addresses are vary
simple you will be constantly fighting with this or that exception.
-Steve
On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
> Is there a way to left trim including the building and number?
>
> Building 3 21-1 Great Avenue, a city, a country, this
> planet.
>
> How to take way those things which are too local to an address?
>
> Regards,
>
> Shao
>
> On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
<[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>>
wrote:
>
> I find this is a simple, but important question.
>
> How best to split numbers and the rest of address?
>
> For instance, one tricky one is as follows:
>
> 21-1 Great Avenue, a city, a country, this planet
>
> How to turn this into the following:
>
> column 1, column 2
>
> 21-1 Great Avenue, a city, a country, this planet
>
> Note: there is a hyphen in 21-1
>
> Any clue?
>
> Regards,
>
> Shao
>
>
> _______________________________________________
> postgis-users mailing list
> [email protected] <mailto:[email protected]>
> https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected] <mailto:[email protected]>
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users