I am looking to track down an issue in 2.9.2 where during highlighting, certain
data may cause rapid memory usage and OOM exception in java:
-------
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.analysis.Token.growTermBuffer(Token.java:470)
at
org.apache.lucene.analysis.Token.setTermBuffer(Token.java:395)
at
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:200)
at
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:112)
at
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:249)
at
com.bmc.arsys.fts.impl.lucene.LuceneFTSService.doHighlight(LuceneFTSService.java:1871)
-------
doHighlight is our method that calls to do the highlighting. I did a search
for issue but so far have not come across any hits on Google, etc.
doHighlight looks like this at the beginning:
------
private String doHighlight(IndexReader indexReader, int docId, String
strFieldName, Query query, String strText,
boolean isTitle, String markupLeft, String markupRight) {
String strBestText = null;
try {
TokenStream tokenStream = TokenSources.getTokenStream(indexReader,
docId, strFieldName);
QueryScorer scorer = new QueryScorer(query, strFieldName);
Fragmenter fragmenter = null;
------
It fails during the tokenStream. Now when I say fails, the memory shoots from
1.5GB to beyond 8.0GB in some cases where we stopped experimenting with adding
memory. The entire collection directory is only 4.4 GB and the search strings
are usually very simple, but it seems to be related to the data that is
returned in some cases. (i.e. search for "db" and some data causes this to
shoot up even if there are only 1 or a few (<10) hits).
Has anyone seen this before?