Hi, Steve, Thank you very much.
I am testing and evaluating possibilities and things which are worthwhile to do. Things which have solid proof. Regards, Shao On Sun, 26 Apr 2020 at 17:13, Stephen Woodbridge < [email protected]> wrote: > On 4/26/2020 8:09 AM, Shaozhong SHI wrote: > > Hi, Steve, > > > > Thanks. > > > > 2 questions. > > > > 1. How can we remove things like Room 2a, Buildings 2-6b and etc with > > regexp replace? > You need to read up on how to write regexp, becuase you will keep > finding cases like this you want to add to your regexp and you need to > understand what they are doing. Compare the first one I gave you to this > and make sure you understand why I made these changes to it. Also this > is a little outside the scope of the PostGIS list. > > '^( *(Buildings?|Bldg|Room) *[0-9]+)?[- 0-9]*[a-zA-Z]? ' > > > 2. Once extensions created, can these functions be adapted? Are > > codes available ? I will see whether to put it into a project, so > > that our programmers can have work to do. > > The code for these in in the PostGIS repository, but if you are serious > about making changes to them, then I recommend working with > https://github.com/woodbri/address-standardizer because I rewrote these > function to be easier to modify. The existing code is very hard to > understand and very difficult to make changes to without breaking > things. My new code is hopefully well documented in the repository and > should be straight forward to work with. FYI, I have never built/tested > with PostgreSQL 12+ so it might need some changes to support that, but > it should be good for 9, 10 and 11 versions. > > -Steve > > > > > Regards, > > > > Shao > > > > On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge > > <[email protected] <mailto:[email protected]>> > > wrote: > > > > On 4/25/2020 7:19 PM, Shaozhong SHI wrote: > > > Hi, Steve, > > > > > > Many thanks. Please send me the link to parse_address() and > > > standardize_address(). > > > > If you already have postGIS installed then > > > > create extension address_standardizer; > > > > # \df parse_address > > List of functions > > Schema | Name | Result data type | Argument data types > > | Type > > > > --------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+-------- > > public | parse_address | record | text, OUT num text, OUT > > street text, OUT street2 text, OUT address1 text, OUT city text, OUT > > state text, OUT zip text, OUT zipplus text, OUT country text | normal > > (1 row) > > > > # select * from parse_address('123-2 main street city ny'); > > num | street | street2 | address1 | city | state > > | zip > > | zipplus | country > > > > -------+-------------+---------+-------------------+------+-------+-----+---------+--------- > > 123-2 | main street | | 123-2 main street | city | NY | > > | | US > > (1 row) > > > > # \df standardize* > > List of > functions > > Schema | Name | Result data type | > > Argument data types | Type > > > > --------+---------------------+------------------+---------------------------------------------------------------+-------- > > public | standardize_address | stdaddr | lextab text, > > gaztab > > text, rultab text, address text | normal > > public | standardize_address | stdaddr | lextab text, > > gaztab > > text, rultab text, micro text, macro text | normal > > > > You need tables for the lexicon, gazetteer, and rules, which > > should be > > included in the extension but I'm not seeing them. So you can grab > > these > > from: > > > > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql > > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql > > > https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql > > > > and load them like: > > > > psql mydb -f us-gaz.sql > > psql mydb -f us-lex.sql > > psql mydb -f us-rules.sql > > > > # select * from standardize_address('lex', 'gaz', 'rules', '123-2 > > main > > street city ny'); > > building | house_num | predir | qual | pretype | name | suftype | > > sufdir | ruralroute | extra | city | state | country | postcode > > | box > > | unit > > > > ----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------ > > | 123 | | | | 2 MAIN | STREET > > | | | | CITY | NEW YORK | USA | | | > > (1 row) > > > > > > This is a good example of why parsing addresses is so difficult. The > > rules for standardize_address do not account for a house number like > > "123-2", but the regexp in parse_address do handle it. It is easy > > to get > > the 80% right and very hard to get it much above that. > > > > -Steve > > > > > > > > > > I need to find these first before test-running. > > > > > > Regards, > > > > > > Shao > > > > > > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge > > > <[email protected] > > <mailto:[email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>>> > > > wrote: > > > > > > Shao, > > > > > > '^( *Building *[0-9]+)?[- 0-9]*' > > > > > > or something like that should do it. But I think you will > > find that a > > > more robust solution is to use parse_address() and/or > > > standardize_address() as they will recognize a lot of other > > address > > > constructs, like "apt 3a" for for example. > > > > > > parse_address() that a text field and breaks it into "house > > number > > > street name" and "city state zip", but only works well in North > > > America. > > > > > > standardize_address() that comes with postGIS, breaks the > > address > > > down > > > into its components and can separate out things like > > buildings, and > > > apartment/unit specifiers so you can then take the fields > > you are > > > interested in and recombine just them in a new string. > > Again, this > > > works > > > best in North America. > > > > > > My github address-standardizer is built to recognize address > > for most > > > counties, but it can also be configured to recognize address > > > standards > > > for any county without too much effort. It compiles and > > installs as > > > postgresql extension. > > > > > > Addresses are generally very messy and unless your addresses > > are vary > > > simple you will be constantly fighting with this or that > > exception. > > > > > > -Steve > > > > > > On 4/25/2020 2:55 PM, Shaozhong SHI wrote: > > > > Is there a way to left trim including the building and > number? > > > > > > > > Building 3 21-1 Great Avenue, a city, a > > country, this > > > > planet. > > > > > > > > How to take way those things which are too local to an > > address? > > > > > > > > Regards, > > > > > > > > Shao > > > > > > > > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI > > > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > > <mailto:[email protected] > > <mailto:[email protected]> <mailto:[email protected] > > <mailto:[email protected]>>>> > > > wrote: > > > > > > > > I find this is a simple, but important question. > > > > > > > > How best to split numbers and the rest of address? > > > > > > > > For instance, one tricky one is as follows: > > > > > > > > 21-1 Great Avenue, a city, a country, this planet > > > > > > > > How to turn this into the following: > > > > > > > > column 1, column 2 > > > > > > > > 21-1 Great Avenue, a city, a country, > > this planet > > > > > > > > Note: there is a hyphen in 21-1 > > > > > > > > Any clue? > > > > > > > > Regards, > > > > > > > > Shao > > > > > > > > > > > > _______________________________________________ > > > > postgis-users mailing list > > > > [email protected] > > <mailto:[email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>> > > > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > > > _______________________________________________ > > > postgis-users mailing list > > > [email protected] > > <mailto:[email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>> > > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > > > > > > _______________________________________________ > > > postgis-users mailing list > > > [email protected] <mailto: > [email protected]> > > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > _______________________________________________ > > postgis-users mailing list > > [email protected] <mailto:[email protected]> > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > > > > _______________________________________________ > > postgis-users mailing list > > [email protected] > > https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
