[jira] [Commented] (OPENNLP-1505) Reduce object creation in NGramCharModel and StringUtil

ASF GitHub Bot (Jira) Tue, 25 Jul 2023 04:37:15 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746928#comment-17746928
 ]


ASF GitHub Bot commented on OPENNLP-1505:
-----------------------------------------

mawiesne commented on PR #543:
URL: https://github.com/apache/opennlp/pull/543#issuecomment-1649661780

   > Triggered Eval run: 
https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/4/
   
   ```
   [INFO] Apache OpenNLP Reactor ............................. SUCCESS [  3.217 
s]
   [INFO] Apache OpenNLP Tools ............................... SUCCESS [  06:46 
h]
   [INFO] Apache OpenNLP UIMA Annotators ..................... SUCCESS [  2.303 
s]
   [INFO] Apache OpenNLP Brat Annotator ...................... SUCCESS [  6.993 
s]
   [INFO] Apache OpenNLP Morfologik Addon .................... SUCCESS [  4.517 
s]
   [INFO] Apache OpenNLP Documentation ....................... SUCCESS [  0.047 
s]
   [INFO] Apache OpenNLP Distribution ........................ SUCCESS [22:07 
min]
   [INFO] Apache OpenNLP DL .................................. SUCCESS [ 17.742 
s]
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time:  07:09 h
   [INFO] Finished at: 2023-07-24T20:03:15Z
   [INFO] 
------------------------------------------------------------------------
   Finished: SUCCESS
   ```
   
   ^

> Reduce object creation in NGramCharModel and StringUtil
> -------------------------------------------------------
>
>                 Key: OPENNLP-1505
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1505
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Language Detector
>    Affects Versions: 2.2.0
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.2.1
>
>
> During a profiling session, I noticed that many tests in 
> opennlp.tools.langdetect take quite some time for execution. Digging deeper 
> into those tests, it quickly became obvious that StringUtil#toLowerCase() was 
> creating new Strings for every call of this method (see 
> NGramCharModel#add(...) lines 99 to 108.
> Being called in NGramCharModel quite frequently, this resulted in creation of 
> millions of String objects during building ngrams for given input.
> Aims:
>  * Reduce objection creation and thus creation of millions of string objects
>  * Improve runtime of the langdetect tests (and potentially others)
> Idea:
>  * Use (Heap)CharBuffer instead of String so that underlying char arrays can 
> be re-used, instead of copying the chars over to a new string for each 
> "toLowerCase"...
> Note:
>  * A corresponding patch / PR should be tested with/against the Evaluation 
> suite.
> Comments welcome.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1505) Reduce object creation in NGramCharModel and StringUtil

Reply via email to