[ https://issues.apache.org/jira/browse/LUCENE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503082#comment-17503082 ]
ASF subversion and git services commented on LUCENE-10171: ---------------------------------------------------------- Commit 8afec33e747ec81c2301a4b099bd26b4195a556e in lucene's branch refs/heads/main from Spyros Kapnissis [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8afec33 ] LUCENE-10171: OpenNLPOpsFactory should directly cache DictionaryLemmatizer objects (#380) Instead of caching dictionary strings and building multiple redundant DictionaryLemmatizer objects. Co-authored-by: Michael Gibney <mich...@michaelgibney.net> > Caching issue on dictionary-based OpenNLPLemmatizerFilterFactory > ---------------------------------------------------------------- > > Key: LUCENE-10171 > URL: https://issues.apache.org/jira/browse/LUCENE-10171 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 9.0, 7.7.3, 8.10 > Reporter: Spyros Kapnissis > Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > When providing a lemmas.txt dictionary file, OpenNLPLemmatizerFilterFactory > caches internally only the string format of the dictionary, and not the > DictionaryLemmatizer object. This results in parsing and creating a new > DictionaryLemmatizer object every time the > OpenNLPLemmatizerFilterFactory.create() is called. > In our case, with a large lemmas.txt file (5MB) and the > OpenNLPLemmatizerFilter used in many fields across our setup and in multiple > collections (we use Solr), we had several random OOM issues and generally > high server load due to GC activity. After heap dump analysis we noticed few > thousands of DictionaryLemmatizer instances of around 80MB each. > By switching the caching to the DictionaryLemmatizer instead of the String, > we were able to resolve these issues. I will be attaching a PR for review, > please let me know of any comments. > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org