On Wed, 16 Apr 2008, Andreas <[EMAIL PROTECTED]> writes: > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person > considered it "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google', > gogle', 'guugle' > > Then I want to match abbrevations like 'A-Company Ltd.', 'a company > ltd.', 'A-Company Limited' > > Is there a way to do this? > It would be OK just to list candidats up to be manually checked > afterwards.
You can try something similar to below example. (levenshtein(text, text) function is supplied by fuzzystrmatch module.) SELECT T1.col, T2.col FROM tbl AS T1, INNER JOIN tbl AS T2 ON T1.col <> T2.col AND levenshtein(T1.col, T2.col) < (length(T1.col) * 0.5) Regards. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql