[ 
https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816393#comment-13816393
 ] 

Gary Gregory commented on CODEC-174:
------------------------------------

Thomas, 

Thank you for your patches.

I've applied two out of the three. See the new class 
{{org.apache.commons.codec.language.bm.PhoneticEnginePerformanceTest}}.

I did not apply 
https://issues.apache.org/jira/secure/attachment/12611962/CODEC-174-delete-subsequence-cache.patch
 because it did not improve speed and gave me very inconsistent results (See 
Javadoc in the performance test class.)

Feel free to revisit these changes and propose additional patches.

Gary

> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, 
> CODEC-174-delete-subsequence-cache.patch, 
> CODEC-174-reuse-set-in-PhonemeBuilder.patch, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents 
> using this encoder, the import time is multiplied by 30. So, I have decided 
> to optimize the current implementation in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance 
> hack" about a subsequence cache. This cache doesn't optimize performance and 
> after deleting it, you can win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map 
> instead of List. With it, you can access to a rule directly with the 
> beginning of pattern. This patch divide the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to