Re: [SQL] extracting location info from string

Andrej Wed, 25 May 2011 15:21:25 -0700

On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
<[email protected]> wrote:
> On Wed, 25 May 2011 09:25:48 -0600
> Rob Sargent <[email protected]> wrote:
>
>>
>>
>>On 05/24/2011 10:57 AM, Lew wrote:
>>> Tarlika Elisabeth Schmitz wrote:
>>>
>>>> CREATE TABLE person
>>>> (
>>>> id integer NOT NULL,
>>>> "name" character varying(256) NOT NULL,
>>>> "location" character varying(256),
>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>> );
>>>>
>>>> this was just a TEMPORARY table I created for quick analysis
>>>> of my CSV data (now renamed to temp_person).
>
> CREATE TABLE country
> (
>  id character varying(3) NOT NULL, -- alpha-3 code
>  "name" character varying(50) NOT NULL,
>  CONSTRAINT country_pkey PRIMARY KEY (id)
> );
>
>
>>To minimize the ultimately quite necessary human adjudication, one
>>might make good use of what is often termed "crowd sourcing":  Keep
>>all the distinct "hand entered" values and a map to the final human
>>assessment.
>
> I was wondering how to do just that. I don't think it would be a good
> idea to hard code this into the clean-up script. Take, for instance,
> variations of COUNTRY.NAME spelling. Where would I store these?
>
> I could do with a concept for this problem, which applies to a lot of
> string-type info.


I'd start w/ downloading a list as mentioned here:
http://answers.google.com/answers/threadview?id=596822

And run it through a wee perl script using
http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
to make phonetic matches ...

Then I'd run your own data through DoubleMetaphone, and clean up
matches if not too many false positives show up.



Cheers,
Andrej

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] extracting location info from string

Reply via email to