[
https://issues.apache.org/jira/browse/SOLR-12743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760968#comment-16760968
]
Markus Jelsma commented on SOLR-12743:
--------------------------------------
Bad news, after having two nodes on 7.6 with LFUCache running fine for just
over 24 hours, both went nuts (41M Term instances, 2M PhraseQuery instances,
etc) and ran OOM, just about the same time, while just a few hundred documents
were being indexed.
It doesn't appear to be caused by LFUCache, we had two other 7.2.1 nodes also
on LFUCache, they are still running fine. So it seems that besides this issue,
we might have an even worse problem, one that i cannot reproduce locally nor
consistently on production, yesterday it happened immediately after start up,
now after 24 hours.
Reindexing the same Nutch segment when things went bad doesn't trigger a new
OOM. The heap eating went fast, the nodes died within minutes, just as
yesterday. There is nothing in the logs.
Is this something anyone else has had?
Thanks,
Markus
> Memory leak introduced in Solr 7.3.0
> ------------------------------------
>
> Key: SOLR-12743
> URL: https://issues.apache.org/jira/browse/SOLR-12743
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.3, 7.3.1, 7.4
> Reporter: Tomás Fernández Löbbe
> Priority: Critical
> Attachments: SOLR-12743.patch
>
>
> Reported initially by [~markus17]([1], [2]), but other users have had the
> same issue [3]. Some of the key parts:
> {noformat}
> Some facts:
> * problem started after upgrading from 7.2.1 to 7.3.0;
> * it occurs only in our main text search collection, all other collections
> are unaffected;
> * despite what i said earlier, it is so far unreproducible outside
> production, even when mimicking production as good as we can;
> * SortedIntDocSet instances and ConcurrentLRUCache$CacheEntry instances are
> both leaked on commit;
> * filterCache is enabled using FastLRUCache;
> * filter queries are simple field:value using strings, and three filter query
> for time range using [NOW/DAY TO NOW+1DAY/DAY] syntax for 'today', 'last
> week' and 'last month', but rarely used;
> * reloading the core manually frees OldGen;
> * custom URP's don't cause the problem, disabling them doesn't solve it;
> * the collection uses custom extensions for QueryComponent and
> QueryElevationComponent, ExtendedDismaxQParser and MoreLikeThisQParser, a
> whole bunch of TokenFilters, and several DocTransformers and due it being
> only reproducible on production, i really cannot switch these back to
> Solr/Lucene versions;
> * useFilterForSortedQuery is/was not defined in schema so it was default
> (true?), SOLR-11769 could be the culprit, i disabled it just now only for the
> node running 7.4.0, rest of collection runs 7.2.1;
> {noformat}
> {noformat}
> You were right, it was leaking exactly one SolrIndexSearcher instance on each
> commit.
> {noformat}
> And from Björn Häuser ([3]):
> {noformat}
> Problem Suspect 1
> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by
> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy
> 1.981.148.336 (38,26%) bytes.
> Biggest instances:
> • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272
> (1,35%) bytes.
> • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264
> (1,27%) bytes.
> • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600
> (1,22%) bytes.
> Problem Suspect 2
> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by
> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy
> 1.373.110.208 (26,52%) bytes.
> {noformat}
> More details in the email threads.
> [1]
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201804.mbox/%3Czarafa.5ae201c6.2f85.218a781d795b07b1%40mail1.ams.nl.openindex.io%3E]
> [2]
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201806.mbox/%3Czarafa.5b351537.7b8c.647ddc93059f68eb%40mail1.ams.nl.openindex.io%3E]
> [3]
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201809.mbox/%[email protected]%3E]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]