[ 
https://issues.apache.org/jira/browse/CODEC-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630167#comment-16630167
 ] 

Alex Volodko commented on CODEC-250:
------------------------------------

Regarding the ignored letters: yes, according to the algorithm they are simply 
ignored. So for the implementation: if they are removed from the input string, 
it should be correct. I understand that the description of the algorithm is 
quite short and generic, it might have some room for interpretations. On the 
other hand: this algorithm should allow to compare words phonetically: you 
don't pronounce special characters and for the german language (which is the 
main target group of cologne phonetic) I cannot came up with a "non letter" 
character, which will change the pronunciation of the word.

Regarding the handling of ß (sharp s): yes, you are competly right. Stricly 
according to the algorithm it should be handled as 'S', but both value 'S' and 
'SS' have the same output, so at the end it doesn't matter, but on the other 
hand: also don't make the code more understandable.

> Wrong value calculated by Cologne Phonetic if a special character is placed 
> between equal letters
> -------------------------------------------------------------------------------------------------
>
>                 Key: CODEC-250
>                 URL: https://issues.apache.org/jira/browse/CODEC-250
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.5, 1.11
>            Reporter: Alex Volodko
>            Priority: Major
>
> The algorith for cologne phonetic is (simpilied):
>  # Encode letter by letter from left to right according to the conversion 
> table.
>  # Remove all digits occurring more than once next to each other.
>  # Remove all code "0" except at the beginning.
> Characters which are not specified in conversion table (such as hyphens) are 
> ignored. See https://en.wikipedia.org/wiki/Cologne_phonetics
> If the input is "test-test" the step results will be:
>  # 20822082
>  # 2082082
>  # 28282
> The expected result for "test-test" is therefor 28282.
> The actual result for "test-test" is 282{color:#FF0000}2{color}82.
> This bug is caused by the fix from
> [https://github.com/apache/commons-codec/commit/72c8759a22c6552a2dfcdf61b29729f981752879]
> and is present since 1.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to