I have the code for this method if someone will commit it. Basically, the higher the difference, the better the match (which to me makes no sense, but that's the method's definition).
public int difference(String a, String b) { String soundexa = soundex(a); String soundexb = soundex(b); int alength = a.length(); int res = 0; // return highest difference if the string lengths // don't match if (alength == b.length()) { for (int i=0;i<alength;i++) { if (soundexa.charAt(i) == soundexb.charAt(i)) { res++; } } } return res; } For regular soundex, the difference would range from 0 (the worst) to 4 (the best). For RefinedSoundex, it would be from 0 (the worst) to whathever the length of the soundex strings are, but the same method would work for both versions. here's the description from the SQLServer help: DIFFERENCE Returns the difference between the SOUNDEX values of two character expressions as an integer. Syntax DIFFERENCE ( character_expression , character_expression ) Arguments character_expression Is an expression of type char or varchar. Return Types int Remarks The integer returned is the number of characters in the SOUNDEX values that are the same. The return value ranges from 0 through 4, with 4 indicating the SOUNDEX values are identical. Examples In the first part of this example, the SOUNDEX values of two very similar strings are compared, and DIFFERENCE returns a value of 4. In the second part of this example, the SOUNDEX values for two very different strings are compared, and DIFFERENCE returns a value of 0. USE pubs GO -- Returns a DIFFERENCE value of 4, the least possible difference. SELECT SOUNDEX('Green'), SOUNDEX('Greene'), DIFFERENCE('Green','Greene') GO -- Returns a DIFFERENCE value of 0, the highest possible difference. SELECT SOUNDEX('Blotchet-Halls'), SOUNDEX('Greene'), DIFFERENCE('Blotchet-Halls', 'Greene') GO Here is the result set: ----- ----- ----------- G650 G650 4 (1 row(s) affected) ----- ----- ----------- B432 G650 0 (1 row(s) affected) -----Original Message----- From: Inger, Matthew [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 2:53 PM To: 'Jakarta Commons Developers List' Subject: RE: [codec] Soundex / Refined Soundex Any thoughts on the "difference" method? -----Original Message----- From: Gary Gregory [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 12:18 PM To: 'Jakarta Commons Developers List' Subject: RE: [codec] Soundex / Refined Soundex Hello, Thank you for your interest in [codec]. Soundex is, well, Soundex, a method to find word with similar phonemes. Refined Sounder, OTOH, is more geared towards spellchecking. For example: new Soundex().encode("testing") returns "T235" new RefinedSoundex().encode("testing") returns "T6036084" Gary > -----Original Message----- > From: Inger, Matthew [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 04, 2003 09:08 > To: 'Jakarta Commons Developers List' > Subject: [codec] Soundex / Refined Soundex > > Can anyone tell me the difference between these two soundex > implementations? Also, is there any planned support for a > difference algorithm for soundex (similar to the one provided > by SQLServer?) > > We are looking for a soundex implementation to use in our > software. Thanks in advance for your help.