I know you are on the right track: Standardize first, then deduplicate. In my experience, SQL is not a great language for doing this kind of text manipulation standardizing sometimes requires some rather complex logic. You should be able to easily justify the cost of a commercial app to standardize your mailing list. Any good product should be able to deal with more situations than you are likely to have time to code for and are probably up to date with the "official" postal service databases for the country(s) you need. It should save both time (because you don't have to write it) and frustration (as you won't be the one stuck providing support).
Yours, Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Andrew Kuebler" <[EMAIL PROTECTED]> wrote on 07/21/2004 10:04:17 AM: > I know this is not necessarily a MySQL question, but everyone on this > listserv is always so helpful and I was wondering if anyone had any > pointers on how to deduplicate a list of mailing address since there can > be so many inconsistencies on how an address can be written (road vs rd vs > rd., etc). I was thinking that I probably needed to find a way to > standardize the addresses using USPS standards and then deduplicate. Am I > on the right track or is there a trick someone I haven't found? Thank you. > > > Best Regards, > Andrew > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >