[ 
https://issues.apache.org/jira/browse/LANG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361684#comment-14361684
 ] 

Thomas Neidhart commented on LANG-935:
--------------------------------------

ok my bad, I misread the line.

The problem that I have with this patch is that it tries to optimize a 
very-specific use-case (StringEscapeUtils.escapeXXX methods) but may lead to 
worse performance in other use-cases.

The benchmark is also very limited as it tests only one example of the use of a 
LookupTranslator.

One specifity of the escapeXXX methods is that they solely escape single 
characters, thus we could easily handle this case in the LookupTranslator by 
caching 1-char translations in a separate map by character and handle this case 
differently. The speedup would be the same as for your solution (I benchmarked 
it).

Now the LookupTranslator is a public class, thus users might use it to do their 
own translations. Imagine one created a LookupTranslator that translates some 
strings to other strings. With the patch, the performance might drop if one has 
put many equally sized strings into the translator that have the same first 
character. In this case, all the translations have to be tested all the time 
when such a character is encountered.

> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>
>                 Key: LANG-935
>                 URL: https://issues.apache.org/jira/browse/LANG-935
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>
>         Attachments: tempproject1.zip
>
>
> The escape functions for HTML etc. use the same code and the same 
> initialisation tables for the escape and unescape functions, and while this 
> is an elegant approach it leads to a number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific 
> representation (for example HTML 0x101 should become ā )
> The proposal is to use a new mapping technique to address these issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to