I know  you are on the right track:  Standardize first, then deduplicate.

In my experience, SQL is not a great language for doing this kind of text 
manipulation standardizing sometimes requires some rather complex logic. 
You should be able to easily justify the cost of a commercial app to 
standardize your mailing list. Any good product should be able to deal 
with more situations than you are likely to have time to code for and are 
probably up to date with the "official" postal service databases for the 
country(s) you need. It should save both time (because you don't have to 
write it) and frustration (as you won't be the one stuck providing 
support).

Yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"Andrew Kuebler" <[EMAIL PROTECTED]> wrote on 07/21/2004 10:04:17 AM:

> I know this is not necessarily a MySQL question, but everyone on this
> listserv is always so helpful and I was wondering if anyone had any
> pointers on how to deduplicate a list of mailing address since there can
> be so many inconsistencies on how an address can be written (road vs rd 
vs
> rd., etc). I was thinking that I probably needed to find a way to
> standardize the addresses using USPS standards and then deduplicate. Am 
I
> on the right track or is there a trick someone I haven't found? Thank 
you.
> 
> 
> Best Regards,
> Andrew
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> 

Reply via email to