That's not the behavior either in the latest [codec] release or HEAD. Can you clarify where this 'standard' behavior you describe is documented? Neither the National Archives documentation nor the NIST source code contain this behavior.
> -----Original Message----- > From: C. Scott Ananian [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 02, 2004 11:02 AM > To: Jakarta Commons Developers List > Subject: RE: [codec] Soudex issue with accented character. > > > On Wed, 2 Jun 2004, Edelson, Justin wrote: > > > The only "better" solution I can think of is to map the characters > > into their non-accented equivalent. While I think it's important to > > state that the default Soundex implementation is for > English words, it > > would be nice to accommodate words with accented characters. > > I believe the 'standard' behavior is just to drop the > unaccented character from the soundex encoding. The soundex > algorithm typically already does this for other 'quiet' > characters. (Note that two words with accented characters > will still match correctly even if the accented characters > are dropped.) --scott > > blowfish Rijndael Philadelphia MI6 operation Washington SSBN > 731 UKUSA spy chemical agent Pakistan Bush Waihopai Minister > domestic disruption > ( http://cscott.net/ ) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]