The only "better" solution I can think of is to map the characters into their non-accented equivalent. While I think it's important to state that the default Soundex implementation is for English words, it would be nice to accommodate words with accented characters.
My bigger concern is that the behavior is inconsistent between Soundex, Metaphone, & DoubleMetaphone. Soundex will not throw an IllegalArgumentException, whereas Metaphone passes through the "bad" character. DoubleMetaphone has support for two accented characters, C with Cedilla and N with tilde. To the extent that I think the language codecs should be swappable components, it's a good idea for the support to be consistent. To that end, a String passed to any of the codecs should either throw an exception for all or none. Just my 2 cents. -----Original Message----- From: Gary Gregory [mailto:[EMAIL PROTECTED] Sent: Sunday, May 23, 2004 8:37 PM To: Jakarta Commons Developers List Subject: [codec] Soudex issue with accented character. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=29080 Currently, "ö" or "é" in a String causes Soundex to throw an ArrayIndexOutOfBoundsException. We can either: (1) Throw a better Exception, like IllegalArgumentException: Only 'plain' letter are allowed. Or: (2) Ignore unmapped characters. This would work for "ö" and "é" since vowels are ignored but this could cause bad encoding values for other chars like "ç". AFAIK, you cannot ask if a character is a vowel or not. Thoughts? Gary --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]