On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
Hi, Steve,

Many thanks.  Please send me the link to parse_address() and standardize_address().

If you already have postGIS installed then

create extension address_standardizer;

# \df parse_address
List of functions
 Schema |     Name      | Result data type | Argument data types |  Type
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
 public | parse_address | record           | text, OUT num text, OUT street text, OUT street2 text, OUT address1 text, OUT city text, OUT state text, OUT zip text, OUT zipplus text, OUT country text | normal
(1 row)

# select * from parse_address('123-2 main street city ny');
  num  |   street    | street2 |     address1      | city | state | zip | zipplus | country
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
 123-2 | main street |         | 123-2 main street | city | NY |     |         | US
(1 row)

# \df standardize*
                                                    List of functions
 Schema |        Name         | Result data type |                      Argument data types                      | Type
--------+---------------------+------------------+---------------------------------------------------------------+--------
 public | standardize_address | stdaddr          | lextab text, gaztab text, rultab text, address text           | normal  public | standardize_address | stdaddr          | lextab text, gaztab text, rultab text, micro text, macro text | normal

You need tables for the lexicon, gazetteer, and rules, which should be included in the extension but I'm not seeing them. So you can grab these from:

https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql

and load them like:

psql mydb -f us-gaz.sql
psql mydb -f us-lex.sql
psql mydb -f us-rules.sql

# select * from standardize_address('lex', 'gaz', 'rules', '123-2 main street city ny');  building | house_num | predir | qual | pretype |  name  | suftype | sufdir | ruralroute | extra | city |  state   | country | postcode | box | unit
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
          | 123       |        |      |         | 2 MAIN | STREET |        |            |       | CITY | NEW YORK | USA     | |     |
(1 row)


This is a good example of why parsing addresses is so difficult. The rules for standardize_address do not account for a house number like "123-2", but the regexp in parse_address do handle it. It is easy to get the 80% right and very hard to get it much above that.

-Steve



I need to find these first before test-running.

Regards,

Shao

On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge <[email protected] <mailto:[email protected]>> wrote:

    Shao,

    '^( *Building *[0-9]+)?[- 0-9]*'

    or something like that should do it. But I think you will find that a
    more robust solution is to use parse_address() and/or
    standardize_address() as they will recognize a lot of other address
    constructs, like "apt 3a" for for example.

    parse_address() that a text field and breaks it into "house number
    street name" and "city state zip", but only works well in North
    America.

    standardize_address() that comes with postGIS, breaks the address
    down
    into its components and can separate out things like buildings, and
    apartment/unit specifiers so you can then take the fields you are
    interested in and recombine just them in a new string. Again, this
    works
    best in North America.

    My github address-standardizer is built to recognize address for most
    counties, but it can also be configured to recognize address
    standards
    for any county without too much effort. It compiles and installs as
    postgresql extension.

    Addresses are generally very messy and unless your addresses are vary
    simple you will be constantly fighting with this or that exception.

    -Steve

    On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
    > Is there a way to left trim including the building and number?
    >
    > Building 3  21-1              Great Avenue, a city, a country, this
    > planet.
    >
    > How to take way those things which are too local to an address?
    >
    > Regards,
    >
    > Shao
    >
    > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
    <[email protected] <mailto:[email protected]>
    > <mailto:[email protected] <mailto:[email protected]>>>
    wrote:
    >
    >     I find this is a simple, but important question.
    >
    >     How best to split numbers and the rest of address?
    >
    >     For instance, one tricky one is as follows:
    >
    >     21-1 Great Avenue, a city, a country, this planet
    >
    >     How to turn this into the following:
    >
    >     column 1,       column 2
    >
    >       21-1              Great Avenue, a city, a country, this planet
    >
    >     Note:  there is a hyphen in  21-1
    >
    >     Any clue?
    >
    >     Regards,
    >
    >     Shao
    >
    >
    > _______________________________________________
    > postgis-users mailing list
    > [email protected] <mailto:[email protected]>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users

    _______________________________________________
    postgis-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to