[ 
https://issues.apache.org/jira/browse/CODEC-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031233#comment-14031233
 ] 

michael tobias commented on CODEC-187:
--------------------------------------

I agree that you probably dont want to add TOO many versions into the code - 
because of code size issues.

I have done very limited testing of the Codec implementation.  It APPEARS that 
the EXACT algorithm is working fine and also SEPHARDIC APPROX.  GENERIC APPROX 
- APPEARS to be missing some tokens which you argue might not require 
re-indexing, but the ASHKENAZI APPROX results are just downright WRONG and 
anybody who has created such tokens/indexes is not getting good results.

The BMPM rules/algorithm is fairly stable/static now so I am not sure whether 
it will ever be necessary to implement further versions into the CODEC.

Can I suggest that you consider a 2-version approach? 1 - the current existing 
faulty code, kept for backwards compatibility with existing indexes and the 2nd 
version 3.02 being the most current and likely to remain adequate for the 
foreseeable future?

Michael

 

> Beider Morse Phonetic Matching producing incorrect tokens
> ---------------------------------------------------------
>
>                 Key: CODEC-187
>                 URL: https://issues.apache.org/jira/browse/CODEC-187
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: michael tobias
>            Priority: Minor
>             Fix For: 1.10
>
>         Attachments: CODEC-187.patch
>
>
> I believe the Beider Morse Phonetic Matching algorithm was added in Commons 
> Codec 1.6
> The BMPM algorithm is an EVOLVING algorithm that is currently on version 3.02 
> though it had been static since version 3.01 dated 19 Dec 2011 (it was first 
> available as opensource as version 1.00 on 6 May 2009).
> I can see nothing in the Commons Codec Docs to say which version of BMPM was 
> implemented so I am not sure if the problem with the algorithm as coded in 
> the Codec is simply an old version or whether there are more basic problems 
> with the implementation.
> How do I determine the version of the algorithm that was implemented in the 
> Commons Codec?
> How do we ensure that the algorithm is updated if/when the BMPM algorithm 
> changes?
> How do we ensure that the algorithm as coded in the Commons Codec is accurate 
> and working as expected?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to