On 4/29/2020 9:15 AM, Shaozhong SHI wrote:
Hi, Steve,

I have address_standardizer extension created.

It is not like other functions.  How to review its code? Does it need to adapted to handle other countries addresses?

As I tried to indicate, the postGIS address_standardizer extension is extremely hard to understand and nearly impossible to customize the files. I migrated the PAGC code into this extension but I barely understood the code. Conceptually, the ideas are straight forward, but the code is very hard to follow and understand so I portted it as a black box.

If the exisiting address_standardizer_data_us does not work for you or you need to support other countries, you will need to build and install the one here:

https://github.com/woodbri/address-standardizer

This was written as a replacement
This one has sample files for 25 countries here:

https://github.com/woodbri/address-standardizer/tree/develop/data/sample

that you can customize and has (hopefully) good documentation here:

https://github.com/woodbri/address-standardizer/blob/develop/README.md
https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md
https://github.com/woodbri/address-standardizer/tree/develop/data

Also, this code is written in C++ and I hope it is written to be easier to understand and review.

Not sure I can help much more than this unless you have questions on https://github.com/woodbri/address-standardizer which we should probably take off the PostGIS list as this is not part of PostGIS.

-Steve


Regards,

Shao

On Sun, 26 Apr 2020 at 17:23, Stephen Woodbridge <[email protected] <mailto:[email protected]>> wrote:

    Shao,

    I just remembered the lex, gaz, and rules data is in a separate
    extension. The correct way to install it is with:

    create extension address_standardizer_data_us;

    -Steve

    On 4/26/2020 8:09 AM, Shaozhong SHI wrote:
    > Hi, Steve,
    >
    > Thanks.
    >
    > 2 questions.
    >
    > 1.  How can we remove things like Room 2a, Buildings 2-6b and
    etc with
    > regexp replace?
    > 2.  Once extensions created, can these functions be adapted?  Are
    > codes available ?    I will see whether to put it into a
    project, so
    > that our programmers can have work to do.
    >
    > Regards,
    >
    > Shao
    >
    > On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge
    > <[email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>>
    > wrote:
    >
    >     On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
    >     > Hi, Steve,
    >     >
    >     > Many thanks.  Please send me the link to parse_address() and
    >     > standardize_address().
    >
    >     If you already have postGIS installed then
    >
    >     create extension address_standardizer;
    >
    >     # \df parse_address
    >     List of functions
    >       Schema |     Name      | Result data type | Argument data
    types
    >     |  Type
    >
     
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
    >       public | parse_address | record           | text, OUT num
    text, OUT
    >     street text, OUT street2 text, OUT address1 text, OUT city
    text, OUT
    >     state text, OUT zip text, OUT zipplus text, OUT country text
    | normal
    >     (1 row)
    >
    >     # select * from parse_address('123-2 main street city ny');
    >        num  |   street    | street2 |     address1      | city |
    state
    >     | zip
    >     | zipplus | country
    >
     
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
    >       123-2 | main street |         | 123-2 main street | city |
    NY |
    >     |         | US
    >     (1 row)
    >
    >     # \df standardize*
    > List of functions
    >       Schema |        Name         | Result data type |
    >     Argument data types                      | Type
    >
     
--------+---------------------+------------------+---------------------------------------------------------------+--------
    >       public | standardize_address | stdaddr          | lextab text,
    >     gaztab
    >     text, rultab text, address text           | normal
    >       public | standardize_address | stdaddr          | lextab text,
    >     gaztab
    >     text, rultab text, micro text, macro text | normal
    >
    >     You need tables for the lexicon, gazetteer, and rules, which
    >     should be
    >     included in the extension but I'm not seeing them. So you
    can grab
    >     these
    >     from:
    >
    >
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
    >
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
    >
    
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql
    >
    >     and load them like:
    >
    >     psql mydb -f us-gaz.sql
    >     psql mydb -f us-lex.sql
    >     psql mydb -f us-rules.sql
    >
    >     # select * from standardize_address('lex', 'gaz', 'rules',
    '123-2
    >     main
    >     street city ny');
    >       building | house_num | predir | qual | pretype | name  |
    suftype |
    >     sufdir | ruralroute | extra | city |  state   | country |
    postcode
    >     | box
    >     | unit
    >
     
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
    >                | 123       |        |      |         | 2 MAIN |
    STREET
    >     |        |            |       | CITY | NEW YORK | USA     |
    |     |
    >     (1 row)
    >
    >
    >     This is a good example of why parsing addresses is so
    difficult. The
    >     rules for standardize_address do not account for a house
    number like
    >     "123-2", but the regexp in parse_address do handle it. It is
    easy
    >     to get
    >     the 80% right and very hard to get it much above that.
    >
    >     -Steve
    >
    >
    >     >
    >     > I need to find these first before test-running.
    >     >
    >     > Regards,
    >     >
    >     > Shao
    >     >
    >     > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
    >     > <[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>
    >     <mailto:[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>>>
    >     > wrote:
    >     >
    >     >     Shao,
    >     >
    >     >     '^( *Building *[0-9]+)?[- 0-9]*'
    >     >
    >     >     or something like that should do it. But I think you will
    >     find that a
    >     >     more robust solution is to use parse_address() and/or
    >     >     standardize_address() as they will recognize a lot of
    other
    >     address
    >     >     constructs, like "apt 3a" for for example.
    >     >
    >     >     parse_address() that a text field and breaks it into
    "house
    >     number
    >     >     street name" and "city state zip", but only works well
    in North
    >     >     America.
    >     >
    >     >     standardize_address() that comes with postGIS, breaks the
    >     address
    >     >     down
    >     >     into its components and can separate out things like
    >     buildings, and
    >     >     apartment/unit specifiers so you can then take the fields
    >     you are
    >     >     interested in and recombine just them in a new string.
    >     Again, this
    >     >     works
    >     >     best in North America.
    >     >
    >     >     My github address-standardizer is built to recognize
    address
    >     for most
    >     >     counties, but it can also be configured to recognize
    address
    >     >     standards
    >     >     for any county without too much effort. It compiles and
    >     installs as
    >     >     postgresql extension.
    >     >
    >     >     Addresses are generally very messy and unless your
    addresses
    >     are vary
    >     >     simple you will be constantly fighting with this or that
    >     exception.
    >     >
    >     >     -Steve
    >     >
    >     >     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
    >     >     > Is there a way to left trim including the building
    and number?
    >     >     >
    >     >     > Building 3  21-1              Great Avenue, a city, a
    >     country, this
    >     >     > planet.
    >     >     >
    >     >     > How to take way those things which are too local to an
    >     address?
    >     >     >
    >     >     > Regards,
    >     >     >
    >     >     > Shao
    >     >     >
    >     >     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
    >     >     <[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>
    >     <mailto:[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>>
    >     >     > <mailto:[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>> <mailto:[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>>>>
    >     >     wrote:
    >     >     >
    >     >     >     I find this is a simple, but important question.
    >     >     >
    >     >     >     How best to split numbers and the rest of address?
    >     >     >
    >     >     >     For instance, one tricky one is as follows:
    >     >     >
    >     >     >     21-1 Great Avenue, a city, a country, this planet
    >     >     >
    >     >     >     How to turn this into the following:
    >     >     >
    >     >     >     column 1,       column 2
    >     >     >
    >     >     >       21-1              Great Avenue, a city, a country,
    >     this planet
    >     >     >
    >     >     >     Note:  there is a hyphen in  21-1
    >     >     >
    >     >     >     Any clue?
    >     >     >
    >     >     >     Regards,
    >     >     >
    >     >     >     Shao
    >     >     >
    >     >     >
    >     >     > _______________________________________________
    >     >     > postgis-users mailing list
    >     >     > [email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>
    >     <mailto:[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>>
    >     >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >     >
    >     >  _______________________________________________
    >     >     postgis-users mailing list
    >     > [email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>
    >     <mailto:[email protected]
    <mailto:[email protected]>
    >     <mailto:[email protected]
    <mailto:[email protected]>>>
    >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >     >
    >     >
    >     > _______________________________________________
    >     > postgis-users mailing list
    >     > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >     _______________________________________________
    >     postgis-users mailing list
    > [email protected]
    <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users
    >
    >
    > _______________________________________________
    > postgis-users mailing list
    > [email protected] <mailto:[email protected]>
    > https://lists.osgeo.org/mailman/listinfo/postgis-users

    _______________________________________________
    postgis-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to