Shao,
I just remembered the lex, gaz, and rules data is in a separate
extension. The correct way to install it is with:
create extension address_standardizer_data_us;
-Steve
On 4/26/2020 8:09 AM, Shaozhong SHI wrote:
Hi, Steve,
Thanks.
2 questions.
1. How can we remove things like Room 2a, Buildings 2-6b and etc with
regexp replace?
2. Once extensions created, can these functions be adapted? Are
codes available ? I will see whether to put it into a project, so
that our programmers can have work to do.
Regards,
Shao
On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge
<[email protected] <mailto:[email protected]>>
wrote:
On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
> Hi, Steve,
>
> Many thanks. Please send me the link to parse_address() and
> standardize_address().
If you already have postGIS installed then
create extension address_standardizer;
# \df parse_address
List of functions
Schema | Name | Result data type | Argument data types
| Type
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
public | parse_address | record | text, OUT num text, OUT
street text, OUT street2 text, OUT address1 text, OUT city text, OUT
state text, OUT zip text, OUT zipplus text, OUT country text | normal
(1 row)
# select * from parse_address('123-2 main street city ny');
num | street | street2 | address1 | city | state
| zip
| zipplus | country
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
123-2 | main street | | 123-2 main street | city | NY |
| | US
(1 row)
# \df standardize*
List of functions
Schema | Name | Result data type |
Argument data types | Type
--------+---------------------+------------------+---------------------------------------------------------------+--------
public | standardize_address | stdaddr | lextab text,
gaztab
text, rultab text, address text | normal
public | standardize_address | stdaddr | lextab text,
gaztab
text, rultab text, micro text, macro text | normal
You need tables for the lexicon, gazetteer, and rules, which
should be
included in the extension but I'm not seeing them. So you can grab
these
from:
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql
and load them like:
psql mydb -f us-gaz.sql
psql mydb -f us-lex.sql
psql mydb -f us-rules.sql
# select * from standardize_address('lex', 'gaz', 'rules', '123-2
main
street city ny');
building | house_num | predir | qual | pretype | name | suftype |
sufdir | ruralroute | extra | city | state | country | postcode
| box
| unit
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
| 123 | | | | 2 MAIN | STREET
| | | | CITY | NEW YORK | USA | | |
(1 row)
This is a good example of why parsing addresses is so difficult. The
rules for standardize_address do not account for a house number like
"123-2", but the regexp in parse_address do handle it. It is easy
to get
the 80% right and very hard to get it much above that.
-Steve
>
> I need to find these first before test-running.
>
> Regards,
>
> Shao
>
> On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
> <[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>
> wrote:
>
> Shao,
>
> '^( *Building *[0-9]+)?[- 0-9]*'
>
> or something like that should do it. But I think you will
find that a
> more robust solution is to use parse_address() and/or
> standardize_address() as they will recognize a lot of other
address
> constructs, like "apt 3a" for for example.
>
> parse_address() that a text field and breaks it into "house
number
> street name" and "city state zip", but only works well in North
> America.
>
> standardize_address() that comes with postGIS, breaks the
address
> down
> into its components and can separate out things like
buildings, and
> apartment/unit specifiers so you can then take the fields
you are
> interested in and recombine just them in a new string.
Again, this
> works
> best in North America.
>
> My github address-standardizer is built to recognize address
for most
> counties, but it can also be configured to recognize address
> standards
> for any county without too much effort. It compiles and
installs as
> postgresql extension.
>
> Addresses are generally very messy and unless your addresses
are vary
> simple you will be constantly fighting with this or that
exception.
>
> -Steve
>
> On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
> > Is there a way to left trim including the building and number?
> >
> > Building 3 21-1 Great Avenue, a city, a
country, this
> > planet.
> >
> > How to take way those things which are too local to an
address?
> >
> > Regards,
> >
> > Shao
> >
> > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
> <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
> > <mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>>
> wrote:
> >
> > I find this is a simple, but important question.
> >
> > How best to split numbers and the rest of address?
> >
> > For instance, one tricky one is as follows:
> >
> > 21-1 Great Avenue, a city, a country, this planet
> >
> > How to turn this into the following:
> >
> > column 1, column 2
> >
> > 21-1 Great Avenue, a city, a country,
this planet
> >
> > Note: there is a hyphen in 21-1
> >
> > Any clue?
> >
> > Regards,
> >
> > Shao
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > [email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> [email protected] <mailto:[email protected]>
> https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected] <mailto:[email protected]>
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users