[ https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820484#comment-13820484 ]
Gary Gregory commented on CODEC-174: ------------------------------------ I seems that enough has changed by now such that the clean up patch does not apply: https://issues.apache.org/jira/secure/attachment/12612839/CODEC_174_cleanup.patch Feel free to resubmit if you see fit. Thank you for the patches! Great performance improvements. We'll have to decide how to address the break in BC. > Improve performance of Beider Morse encoder > ------------------------------------------- > > Key: CODEC-174 > URL: https://issues.apache.org/jira/browse/CODEC-174 > Project: Commons Codec > Issue Type: Improvement > Affects Versions: 1.6, 1.7 > Reporter: Thomas Champagne > Labels: patch, performance > Attachments: CODEC-174-change-rules-storage-to-Map.patch, > CODEC-174-delete-subsequence-cache-and-use-String.patch, > CODEC-174-delete-subsequence-cache.patch, > CODEC-174-refactor-join-method-in-Phoneme.patch, > CODEC-174-refactor-restrictTo-method-in-SomeLanguages.patch, > CODEC-174-reuse-set-in-PhonemeBuilder.patch, CODEC_174_cleanup.patch, > TestCacheSubSequence.java, test-commons-codec-test-bm.zip > > > I use Beider Morse encoder with Solr. When it indexes a lot of documents > using this encoder, the import time is multiplied by 30. So, I have decided > to optimize the current implementation in the commons-codec. > Currently, I have created two patch. The first patch delete a "performance > hack" about a subsequence cache. This cache doesn't optimize performance and > after deleting it, you can win some milliseconds. > The second patch changes the storage of the rules in memory using a Map > instead of List. With it, you can access to a rule directly with the > beginning of pattern. This patch divide the encoding time by 2. > I will try to find more improvement. If you have any idea, please tell me it. -- This message was sent by Atlassian JIRA (v6.1#6144)