[
https://issues.apache.org/jira/browse/LANG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361684#comment-14361684
]
Thomas Neidhart commented on LANG-935:
--------------------------------------
ok my bad, I misread the line.
The problem that I have with this patch is that it tries to optimize a
very-specific use-case (StringEscapeUtils.escapeXXX methods) but may lead to
worse performance in other use-cases.
The benchmark is also very limited as it tests only one example of the use of a
LookupTranslator.
One specifity of the escapeXXX methods is that they solely escape single
characters, thus we could easily handle this case in the LookupTranslator by
caching 1-char translations in a separate map by character and handle this case
differently. The speedup would be the same as for your solution (I benchmarked
it).
Now the LookupTranslator is a public class, thus users might use it to do their
own translations. Imagine one created a LookupTranslator that translates some
strings to other strings. With the patch, the performance might drop if one has
put many equally sized strings into the translator that have the same first
character. In this case, all the translations have to be tested all the time
when such a character is encountered.
> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>
> Key: LANG-935
> URL: https://issues.apache.org/jira/browse/LANG-935
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.text.translate.*
> Affects Versions: 3.1
> Reporter: Peter Wall
> Priority: Minor
> Labels: performance
> Fix For: Patch Needed
>
> Attachments: tempproject1.zip
>
>
> The escape functions for HTML etc. use the same code and the same
> initialisation tables for the escape and unescape functions, and while this
> is an elegant approach it leads to a number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific
> representation (for example HTML 0x101 should become ā )
> The proposal is to use a new mapping technique to address these issues
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)