[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094309#comment-17094309 ]
Colvin Cowie commented on SOLR-14428: ------------------------------------- {quote}The QueryResultKey could cache by the original string input instead of the Query object, thus it wouldn't be affected. This would short-circuit query parsing and be a performance benefit as well? {quote} On this point specifically, I think the downside to that would be that any variations of query strings which can be rewritten to Query objects that are equal() wouldn't get hits that they currently do, e.g. 'A or B' and 'B or A' could create equal() Queries. And if the behaviour of the query parser is modified by params - rather than local params - then the cached results may be wrong if they're only keyed by the query string, right? Or do you mean only use the String in the case of excessivly large Query objects? In which case the first point is a reasonable compromise. I'm not sure about the second though. {quote}BTW This issue should probably be a Lucene JIRA issue but let's see where we go with this thread further. {quote} I don't think I can change it now myself. But maybe there should be separate issue for doing some nice and generic to deal with caching (large) Query objects which I imagine might be a broader effort, and this/another for resolving the immediate issue with FuzzyQuery? > FuzzyQuery has severe memory usage in 8.5 > ----------------------------------------- > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 8.5, 8.5.1 > Reporter: Colvin Cowie > Assignee: Andrzej Bialecki > Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed: 648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org