Highlight with Proximity search throws an exception

Juraj Jurčo Thu, 01 Oct 2020 06:54:56 -0700

Hi guys,
we are trying to implement search and we have experienced a strange
situation. When our text contains an apostrophe followed by a single
character AND we our search query is composed of exactly two letters
followed by proximity search AND we use highlighting, we get an exception:


*java.lang.IllegalArgumentException: boost must be a positive float, got
> -1.0*


It seems there is a problem at:FuzzyTermsEnum.java:271 (float similarity =
1.0f - (float) ed / (float) minTermLength) when it reaches it with ed=2 and
it sets a negative boost.

I was able to reproduce the error with following code:

import java.io.IOException;
import java.nio.file.Path;

import org.apache.commons.io.FileUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.TokenSources;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.jupiter.api.Test;

class FindSqlHighlightTest {

   @Test
   void reproduceHighlightProblem() throws IOException,
ParseException, InvalidTokenOffsetsException {
      String text = "doesn't";
      String field = "text";
      //NOK: se~, se~2 and any higher number
      //OK: sel~, s~, se~1
      String uQuery = "se~";
      int maxStartOffset = -1;
      Analyzer analyzer = new SimpleAnalyzer();

      Path indexLocation = Path.of("temp",
"reproduceHighlightProblem").toAbsolutePath();
      if (indexLocation.toFile().exists()) {
         FileUtils.deleteDirectory(indexLocation.toFile());
      }
      Directory indexDir = FSDirectory.open(indexLocation);

      //Create index
      IndexWriterConfig dimsIndexWriterConfig = new IndexWriterConfig(analyzer);
      dimsIndexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
      IndexWriter idxWriter = new IndexWriter(indexDir, dimsIndexWriterConfig);
      //add doc
      Document doc = new Document();
      doc.add(new TextField(field, text, Field.Store.NO));
      idxWriter.addDocument(doc);
      //commit
      idxWriter.commit();
      idxWriter.close();

      //search & highlight
      Query query = new QueryParser(field, analyzer).parse(uQuery);
      Highlighter highlighter = new Highlighter(new
SimpleHTMLFormatter(), new QueryScorer(query));
      TokenStream tokenStream = TokenSources.getTokenStream(field,
null, text, analyzer, maxStartOffset);
      String highlighted = highlighter.getBestFragment(tokenStream, text);
      System.out.println(highlighted);
   }
}


Could you please confirm whether it's a bug in Lucene or whether we do
something that is not allowed?

Thanks a lot!
Best,
Juraj+

Highlight with Proximity search throws an exception

Reply via email to