Hi!

I have a contact application where I need to display possible
duplicates within the existing contacts. Possible duplicates means
different contact entries that refer to the same person and might have
the same or slightly different information (typos).

What I currently do is search for different levels of duplication
(it's a single union of 3 queries):
- the first query searches for exact duplicates (exactly the same
name, address, email, phone, etc);
- second query searches for matches using the soundex algorithm on a
restricted set of fields and is given a lower matching score;
- third query applies soundex on more fields and is given an even
lower matching score.

Is there a better algorithm or way to do this fuzzy duplication search
over multiple fields (firstname, lastname, address, etc) ? Pointers to
wikipedia, books, etc appreciated.

-- 
Mack

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:338354
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

Reply via email to