On 4/26/2020 8:09 AM, Shaozhong SHI wrote:
Hi, Steve,

Thanks.

2 questions.

1.  How can we remove things like Room 2a, Buildings 2-6b and etc with regexp replace?
You need to read up on how to write regexp, becuase you will keep finding cases like this you want to add to your regexp and you need to understand what they are doing. Compare the first one I gave you to this and make sure you understand why I made these changes to it. Also this is a little outside the scope of the PostGIS list.

'^( *(Buildings?|Bldg|Room) *[0-9]+)?[- 0-9]*[a-zA-Z]? '

2.  Once extensions created, can these functions be adapted?  Are codes available ?    I will see whether to put it into a project, so that our programmers can have work to do.

The code for these in in the PostGIS repository, but if you are serious about making changes to them, then I recommend working with https://github.com/woodbri/address-standardizer because I rewrote these function to be easier to modify. The existing code is very hard to understand and very difficult to make changes to without breaking things. My new code is hopefully well documented in the repository and should be straight forward to work with. FYI, I have never built/tested with PostgreSQL 12+ so it might need some changes to support that, but it should be good for 9, 10 and 11 versions.

-Steve


Regards,

Shao

On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge <[email protected] <mailto:[email protected]>> wrote:

    On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
    > Hi, Steve,
    >
    > Many thanks.  Please send me the link to parse_address() and
    > standardize_address().

    If you already have postGIS installed then

    create extension address_standardizer;

    # \df parse_address
    List of functions
      Schema |     Name      | Result data type | Argument data types
    |  Type
    
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
      public | parse_address | record           | text, OUT num text, OUT
    street text, OUT street2 text, OUT address1 text, OUT city text, OUT
    state text, OUT zip text, OUT zipplus text, OUT country text | normal
    (1 row)

    # select * from parse_address('123-2 main street city ny');
       num  |   street    | street2 |     address1      | city | state
    | zip
    | zipplus | country
    
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
      123-2 | main street |         | 123-2 main street | city | NY |
    |         | US
    (1 row)

    # \df standardize*
                                                         List of functions
      Schema |        Name         | Result data type |
    Argument data types                      | Type
    
--------+---------------------+------------------+---------------------------------------------------------------+--------
      public | standardize_address | stdaddr          | lextab text,
    gaztab
    text, rultab text, address text           | normal
      public | standardize_address | stdaddr          | lextab text,
    gaztab
    text, rultab text, micro text, macro text | normal

    You need tables for the lexicon, gazetteer, and rules, which
    should be
    included in the extension but I'm not seeing them. So you can grab
    these
    from:

    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql

    and load them like:

    psql mydb -f us-gaz.sql
    psql mydb -f us-lex.sql
    psql mydb -f us-rules.sql

    # select * from standardize_address('lex', 'gaz', 'rules', '123-2
    main
    street city ny');
      building | house_num | predir | qual | pretype |  name  | suftype |
    sufdir | ruralroute | extra | city |  state   | country | postcode
    | box
    | unit
    
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
               | 123       |        |      |         | 2 MAIN | STREET
    |        |            |       | CITY | NEW YORK | USA     | |     |
    (1 row)


    This is a good example of why parsing addresses is so difficult. The
    rules for standardize_address do not account for a house number like
    "123-2", but the regexp in parse_address do handle it. It is easy
    to get
    the 80% right and very hard to get it much above that.

    -Steve


    >
    > I need to find these first before test-running.
    >
    > Regards,
    >
    > Shao
    >
    > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
    > <[email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>>
    > wrote:
    >
    >     Shao,
    >
    >     '^( *Building *[0-9]+)?[- 0-9]*'
    >
    >     or something like that should do it. But I think you will
    find that a
    >     more robust solution is to use parse_address() and/or
    >     standardize_address() as they will recognize a lot of other
    address
    >     constructs, like "apt 3a" for for example.
    >
    >     parse_address() that a text field and breaks it into "house
    number
    >     street name" and "city state zip", but only works well in North
    >     America.
    >
    >     standardize_address() that comes with postGIS, breaks the
    address
    >     down
    >     into its components and can separate out things like
    buildings, and
    >     apartment/unit specifiers so you can then take the fields
    you are
    >     interested in and recombine just them in a new string.
    Again, this
    >     works
    >     best in North America.
    >
    >     My github address-standardizer is built to recognize address
    for most
    >     counties, but it can also be configured to recognize address
    >     standards
    >     for any county without too much effort. It compiles and
    installs as
    >     postgresql extension.
    >
    >     Addresses are generally very messy and unless your addresses
    are vary
    >     simple you will be constantly fighting with this or that
    exception.
    >
    >     -Steve
    >
    >     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
    >     > Is there a way to left trim including the building and number?
    >     >
    >     > Building 3  21-1              Great Avenue, a city, a
    country, this
    >     > planet.
    >     >
    >     > How to take way those things which are too local to an
    address?
    >     >
    >     > Regards,
    >     >
    >     > Shao
    >     >
    >     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
    >     <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
    >     > <mailto:[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>>>
    >     wrote:
    >     >
    >     >     I find this is a simple, but important question.
    >     >
    >     >     How best to split numbers and the rest of address?
    >     >
    >     >     For instance, one tricky one is as follows:
    >     >
    >     >     21-1 Great Avenue, a city, a country, this planet
    >     >
    >     >     How to turn this into the following:
    >     >
    >     >     column 1,       column 2
    >     >
    >     >       21-1              Great Avenue, a city, a country,
    this planet
    >     >
    >     >     Note:  there is a hyphen in  21-1
    >     >
    >     >     Any clue?
    >     >
    >     >     Regards,
    >     >
    >     >     Shao
    >     >
    >     >
    >     > _______________________________________________
    >     > postgis-users mailing list
    >     > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >     _______________________________________________
    >     postgis-users mailing list
    > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >
    > _______________________________________________
    > postgis-users mailing list
    > [email protected] <mailto:[email protected]>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users

    _______________________________________________
    postgis-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to