On 23/05/2011 9:11 AM, Andrej wrote:
On 23 May 2011 10:00, Tarlika Elisabeth Schmitz
<postgres...@numerixtechnology.de>  wrote:
On Sun, 22 May 2011 21:05:26 +0100
Tarlika Elisabeth Schmitz<postgres...@numerixtechnology.de>  wrote:

A column contains location information, which may contain any of the
following:

1) null
2) country name (e.g. "France")
3) city name, region name (e.g. "Bonn, Nordrhein-Westfalen")
4) city name, Rg. region name (e.g. "Frankfurt, Rg. Hessen")
5) city name, Rg region name (e.g. "Frankfurt, Rg Hessen")


I also need to cope with variations of COUNTRY.NAME and REGION.NAME.

This is a hard problem. You're dealing with free-form data that might be easily understood by humans, but relies on various contextual information and knowledge that makes it really hard for computers to understand.

If you want to do a good job of this, your best bet is to plug in 3rd party address analysis software that is dedicated to this task. Most (all?) such packages are commercial, proprietary affairs. They exist because it's really, really hard to do this right.

Another thing of great import is whether the city can occur in the
data column all by itself; if yes, it's next to impossible to distinguish
it from a country.

Not least because some places are both, eg:

  Luxembourg
  The Vatican
  Singapore

(The Grand Duchy of Luxembourg has other cities, but still serves as an example).

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to