[ 
https://issues.apache.org/jira/browse/MIME4J-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746393#comment-16746393
 ] 

ASF GitHub Bot commented on MIME4J-283:
---------------------------------------

GitHub user hirthwork opened a pull request:

    https://github.com/apache/james-mime4j/pull/26

    MIME4J-283 DecoderUtil performance fix

    DecoderUtil currently uses the following regex pattern for rfc2047-encoded 
words: 
    `"(.*?)=\\?(.+?)\\?(\\w)\\?(.*?)\\?="`
    First capturing group `(.*?)` is a very expensive regular expression 
causing next pattern node evaluation on every input character. Because of this 
decoding of 4 KB input (`To:` field with 40-80 recipients) takes up to 200ms on 
modern CPUs.
    
    At the same time, this capturing group used only to store separator text 
between encoded words. Proposed patch reuses existing `tailIndex` for separator 
text extraction and same input decoding now takes only 1-2ms.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hirthwork/james-mime4j master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/james-mime4j/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #26
    
----
commit 028b10572f48f9e4818571c240e76db9ac6acadb
Author: Dmitry Potapov <dpotapov@...>
Date:   2019-01-18T15:17:43Z

    MIME4J-283 DecoderUtil performance fix

----


> DecoderUtil performance fix
> ---------------------------
>
>                 Key: MIME4J-283
>                 URL: https://issues.apache.org/jira/browse/MIME4J-283
>             Project: James Mime4j
>          Issue Type: Improvement
>          Components: parser (core)
>    Affects Versions: master, 0.8.2
>            Reporter: Dmitry Potapov
>            Priority: Minor
>         Attachments: patch
>
>
> DecoderUtil currently uses the following regex pattern for rfc2047-encoded 
> words: 
> {code:java}
> "(.*?)=\\?(.+?)\\?(\\w)\\?(.*?)\\?="
> {code}
> First capturing group {{(.*?)}} is a very expensive regular expression 
> causing next pattern node evaluation on every input character. Because of 
> this decoding of 4 KB input ({{To:}} field with 40-80 recipients) takes up to 
> 200ms on modern CPUs.
> At the same time, this capturing group used only to store separator text 
> between encoded words. Proposed patch reuses existing {{tailIndex}} for 
> separator text extraction and same input decoding now takes only 1-2ms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to