[ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yossi Tamari reopened CODEC-199:
--------------------------------

In the first patch I submitted I tried to localize the changes to reduce risk. 
Having thought about it since, I have a better patch which I think is more 
efficient (less map lookups), more correct (the HW rule is specific to the US 
English mapping, but it was implemented in the main code, I fixed this by 
defining a new mapping character of '#' that marks a silent letter, and mapping 
H and W to it), and I think results in simpler code.
Patch attached as better.patch.

> Bug in HW rule in Soundex
> -------------------------
>
>                 Key: CODEC-199
>                 URL: https://issues.apache.org/jira/browse/CODEC-199
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Yossi Tamari
>             Fix For: 1.11
>
>         Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to