Shao,

I just remembered the lex, gaz, and rules data is in a separate extension. The correct way to install it is with:

create extension address_standardizer_data_us;

-Steve

On 4/26/2020 8:09 AM, Shaozhong SHI wrote:
Hi, Steve,

Thanks.

2 questions.

1.  How can we remove things like Room 2a, Buildings 2-6b and etc with regexp replace? 2.  Once extensions created, can these functions be adapted?  Are codes available ?    I will see whether to put it into a project, so that our programmers can have work to do.

Regards,

Shao

On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge <[email protected] <mailto:[email protected]>> wrote:

    On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
    > Hi, Steve,
    >
    > Many thanks.  Please send me the link to parse_address() and
    > standardize_address().

    If you already have postGIS installed then

    create extension address_standardizer;

    # \df parse_address
    List of functions
      Schema |     Name      | Result data type | Argument data types
    |  Type
    
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
      public | parse_address | record           | text, OUT num text, OUT
    street text, OUT street2 text, OUT address1 text, OUT city text, OUT
    state text, OUT zip text, OUT zipplus text, OUT country text | normal
    (1 row)

    # select * from parse_address('123-2 main street city ny');
       num  |   street    | street2 |     address1      | city | state
    | zip
    | zipplus | country
    
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
      123-2 | main street |         | 123-2 main street | city | NY |
    |         | US
    (1 row)

    # \df standardize*
                                                         List of functions
      Schema |        Name         | Result data type |
    Argument data types                      | Type
    
--------+---------------------+------------------+---------------------------------------------------------------+--------
      public | standardize_address | stdaddr          | lextab text,
    gaztab
    text, rultab text, address text           | normal
      public | standardize_address | stdaddr          | lextab text,
    gaztab
    text, rultab text, micro text, macro text | normal

    You need tables for the lexicon, gazetteer, and rules, which
    should be
    included in the extension but I'm not seeing them. So you can grab
    these
    from:

    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql

    and load them like:

    psql mydb -f us-gaz.sql
    psql mydb -f us-lex.sql
    psql mydb -f us-rules.sql

    # select * from standardize_address('lex', 'gaz', 'rules', '123-2
    main
    street city ny');
      building | house_num | predir | qual | pretype |  name  | suftype |
    sufdir | ruralroute | extra | city |  state   | country | postcode
    | box
    | unit
    
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
               | 123       |        |      |         | 2 MAIN | STREET
    |        |            |       | CITY | NEW YORK | USA     | |     |
    (1 row)


    This is a good example of why parsing addresses is so difficult. The
    rules for standardize_address do not account for a house number like
    "123-2", but the regexp in parse_address do handle it. It is easy
    to get
    the 80% right and very hard to get it much above that.

    -Steve


    >
    > I need to find these first before test-running.
    >
    > Regards,
    >
    > Shao
    >
    > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
    > <[email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>>
    > wrote:
    >
    >     Shao,
    >
    >     '^( *Building *[0-9]+)?[- 0-9]*'
    >
    >     or something like that should do it. But I think you will
    find that a
    >     more robust solution is to use parse_address() and/or
    >     standardize_address() as they will recognize a lot of other
    address
    >     constructs, like "apt 3a" for for example.
    >
    >     parse_address() that a text field and breaks it into "house
    number
    >     street name" and "city state zip", but only works well in North
    >     America.
    >
    >     standardize_address() that comes with postGIS, breaks the
    address
    >     down
    >     into its components and can separate out things like
    buildings, and
    >     apartment/unit specifiers so you can then take the fields
    you are
    >     interested in and recombine just them in a new string.
    Again, this
    >     works
    >     best in North America.
    >
    >     My github address-standardizer is built to recognize address
    for most
    >     counties, but it can also be configured to recognize address
    >     standards
    >     for any county without too much effort. It compiles and
    installs as
    >     postgresql extension.
    >
    >     Addresses are generally very messy and unless your addresses
    are vary
    >     simple you will be constantly fighting with this or that
    exception.
    >
    >     -Steve
    >
    >     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
    >     > Is there a way to left trim including the building and number?
    >     >
    >     > Building 3  21-1              Great Avenue, a city, a
    country, this
    >     > planet.
    >     >
    >     > How to take way those things which are too local to an
    address?
    >     >
    >     > Regards,
    >     >
    >     > Shao
    >     >
    >     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
    >     <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
    >     > <mailto:[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>>>
    >     wrote:
    >     >
    >     >     I find this is a simple, but important question.
    >     >
    >     >     How best to split numbers and the rest of address?
    >     >
    >     >     For instance, one tricky one is as follows:
    >     >
    >     >     21-1 Great Avenue, a city, a country, this planet
    >     >
    >     >     How to turn this into the following:
    >     >
    >     >     column 1,       column 2
    >     >
    >     >       21-1              Great Avenue, a city, a country,
    this planet
    >     >
    >     >     Note:  there is a hyphen in  21-1
    >     >
    >     >     Any clue?
    >     >
    >     >     Regards,
    >     >
    >     >     Shao
    >     >
    >     >
    >     > _______________________________________________
    >     > postgis-users mailing list
    >     > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >     _______________________________________________
    >     postgis-users mailing list
    > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >
    > _______________________________________________
    > postgis-users mailing list
    > [email protected] <mailto:[email protected]>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users

    _______________________________________________
    postgis-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to