[Codec] accented character soundex revisited

Chris Black Wed, 15 Feb 2006 13:28:56 -0800

Over 18 months ago there was a thread on this list about the properhandling of accented characters in the Soundex encoder in commons-codecbut it never seemed to get resolved. In addition, there are stillfailing unit tests that reference this issue in the current version ofthe code. As someone who uses this code, I'd like to see all unit testspassing, so I've done some investigation.As a refresher, there were three options discussed for the behavior ofthe Soundex codec when it sees an accented character:

1) Throw an IllegalArgumentException
2) Drop it silently
3) Replace it with the equivalent unaccented character

Right now the code drops it silently, but the unit tests are expectingan IllegalArgumentException. The code in Soundex.map(char ch) seems tobe trying to throw this exception, but it will never happen because thecharacters passed to it from Soundex.soundex are from a String that hasgone through SoundexUtils.clean(String str) which removes all charactersthat fail a Character.isCharacter(char ch) check (accented chars failthis check, I, erm, checked). This means if we want to throw anIllegalArgumentException it must be done in SoundexUtils.clean, notSoundex.map.

I think either behaviors 1 or 2 (drop silently, which is what wecurrently do) would be easy to implement and then change the unit teststo match the behavior so all unit tests on commons-codec pass.

If someone lets me know which behavior is desired I will submit a patch.Note that behavior 2 only requires either removing the test cases orchanging them to expect the same encoding as an empty string.


References:
http://issues.apache.org/bugzilla/show_bug.cgi?id=29080
http://www.mail-archive.com/commons-dev@jakarta.apache.org/msg41974.html

Best,
Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[Codec] accented character soundex revisited

Reply via email to