Using Lucene to find duplicate/similar names

Andy DePue Wed, 16 Apr 2008 09:08:14 -0700

I'm new to Lucene, and would like to use it to find duplicate (orsimilar) names in a contact list. Is Lucene a good fit?We have a form where a user enters a company or person's name, and wewant the system to warn them if there is already a company or personentered with the same or similar name.Based on the little I know of Lucene, I'm thinking an NGram algorithm(based on characters, not words) would work best... but, I'm not sure ifLucene takes proximity or edit distances into account? For example, sayyou have these two names:

 Andrew John
 John Andrew

If a user enters Andy John, without proximity or edit distance, thesetwo names will match about the same, while, obviously, the first nameshould be ranked higher.

Thanks in advance for any help or advice.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Using Lucene to find duplicate/similar names

Reply via email to