[jira] [Commented] (LANG-935) Possible performance improvement on string escape functions

ASF GitHub Bot (JIRA) Fri, 13 Mar 2015 07:14:58 -0700

    [ 
https://issues.apache.org/jira/browse/LANG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360394#comment-14360394
 ]


ASF GitHub Bot commented on LANG-935:
-------------------------------------

GitHub user CodingFabian opened a pull request:

    https://github.com/apache/commons-lang/pull/50

    LANG-935 optimize lookup of translations by LookupTranslator

    The previous implementation retrieved substrings from the input and checked 
if
    it had an replacement for it. The problem is that this will always create
    substrings (which are no longer "free" since JDK 7). This happens also for
    substrings which are obviously not having a mapping.
    
    The new implementation will no longer hash substrings, but will look for
    translations that could be applied to the input.
    Usually the very first character can rule out translation already, so this 
is
    the new key for the mapping table.
    
    This is twice as fast as the previous implementation and avoids a lot of
    Substring allocation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingFabian/commons-lang LANG-935

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/commons-lang/pull/50.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #50
    
----
commit d8c59abd376a8aaf24fcca46a571efffe20b7a76
Author: Fabian Lange <[email protected]>
Date:   2015-03-13T14:00:07Z

    LANG-935 optimize lookup of translations by LookupTranslator
    
    The previous implementation retrieved substrings from the input and checked 
if
    it had an replacement for it. The problem is that this will always create
    substrings (which are no longer "free" since JDK 7). This happens also for
    substrings which are obviously not having a mapping.
    
    The new implementation will no longer hash substrings, but will look for
    translations that could be applied to the input.
    Usually the very first character can rule out translation already, so this 
is
    the new key for the mapping table.
    
    This is twice as fast as the previous implementation and avoids a lot of
    Substring allocation.

----


> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>
>                 Key: LANG-935
>                 URL: https://issues.apache.org/jira/browse/LANG-935
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>
>         Attachments: tempproject1.zip
>
>
> The escape functions for HTML etc. use the same code and the same 
> initialisation tables for the escape and unescape functions, and while this 
> is an elegant approach it leads to a number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific 
> representation (for example HTML 0x101 should become &amp;#257; )
> The proposal is to use a new mapping technique to address these issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LANG-935) Possible performance improvement on string escape functions

Reply via email to