Highlighting

Baldwin, David Wed, 30 Jul 2014 10:37:19 -0700

I am looking to track down an issue in 2.9.2 where during highlighting, certain 
data may cause rapid memory usage and OOM exception in java:


-------
java.lang.OutOfMemoryError: Java heap space
                at 
org.apache.lucene.analysis.Token.growTermBuffer(Token.java:470)
                at 
org.apache.lucene.analysis.Token.setTermBuffer(Token.java:395)
                at 
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:200)
                at 
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:112)
                at 
org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:249)
                at 
com.bmc.arsys.fts.impl.lucene.LuceneFTSService.doHighlight(LuceneFTSService.java:1871)
-------


doHighlight is our method that calls to do the highlighting.   I did a search 
for issue but so far have not come across any hits on Google, etc.

doHighlight looks like this at the beginning:

------
private String doHighlight(IndexReader indexReader, int docId, String 
strFieldName, Query query, String strText,
            boolean isTitle, String markupLeft, String markupRight) {
        String strBestText = null;
        try {
            TokenStream tokenStream = TokenSources.getTokenStream(indexReader, 
docId, strFieldName);
            QueryScorer scorer = new QueryScorer(query, strFieldName);
            Fragmenter fragmenter = null;
------


It fails during the tokenStream.  Now when I say fails, the memory shoots from 
1.5GB to beyond 8.0GB in some cases where we stopped experimenting with adding 
memory.  The entire collection directory is only 4.4 GB and the search strings 
are usually very simple, but it seems to be related to the data that is 
returned in some cases.  (i.e. search for "db" and some data causes this to 
shoot up even if there are only 1 or a few (<10) hits).

Has anyone seen this before?

2.9.2 Memory issue 8.0GB or more / OOM with Term / Highlighting

Reply via email to