Hi Juraj+,

This indeed smells like a bug.  FuzzyTermsEnum should never try to set a
negative boost!

Could you open an issue and open a PR (or attach a patch) with your test
case?  Thank you for boiling this down.  This part really made me chuckle:

> When our text contains an apostrophe followed by a single character AND
we our search query is composed of exactly two letters followed by
proximity search AND we use highlighting, we get an exception:

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 1, 2020 at 12:48 PM Michael Sokolov <[email protected]> wrote:

> I traced this to this block in FuzzyTermsEnum:
>
>     if (ed == 0) { // exact match
>       boostAtt.setBoost(1.0F);
>     } else {
>       final int codePointCount = UnicodeUtil.codePointCount(term);
>       int minTermLength = Math.min(codePointCount, termLength);
>
>       float similarity = 1.0f - (float) ed / (float) minTermLength;
>       boostAtt.setBoost(similarity);
>     }
>
> where in your test ed (edit distance) was 2 and minTermLength 1,
> leading to negative boost.
>
> I don't really understand this code at all, but I wonder if it should
> divide by maxTermLength instead of minTermLength?
>
> On Thu, Oct 1, 2020 at 9:54 AM Juraj Jurčo <[email protected]> wrote:
> >
> > Hi guys,
> > we are trying to implement search and we have experienced a strange
> situation. When our text contains an apostrophe followed by a single
> character AND we our search query is composed of exactly two letters
> followed by proximity search AND we use highlighting, we get an exception:
> >
> >> java.lang.IllegalArgumentException: boost must be a positive float, got
> -1.0
> >
> >
> > It seems there is a problem at:FuzzyTermsEnum.java:271 (float similarity
> = 1.0f - (float) ed / (float) minTermLength) when it reaches it with ed=2
> and it sets a negative boost.
> >
> > I was able to reproduce the error with following code:
> >
> > import java.io.IOException;
> > import java.nio.file.Path;
> >
> > import org.apache.commons.io.FileUtils;
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.analysis.TokenStream;
> > import org.apache.lucene.analysis.core.SimpleAnalyzer;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.Field;
> > import org.apache.lucene.document.TextField;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.index.IndexWriterConfig;
> > import org.apache.lucene.queryparser.classic.ParseException;
> > import org.apache.lucene.queryparser.classic.QueryParser;
> > import org.apache.lucene.search.Query;
> > import org.apache.lucene.search.highlight.Highlighter;
> > import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
> > import org.apache.lucene.search.highlight.QueryScorer;
> > import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> > import org.apache.lucene.search.highlight.TokenSources;
> > import org.apache.lucene.store.Directory;
> > import org.apache.lucene.store.FSDirectory;
> > import org.junit.jupiter.api.Test;
> >
> > class FindSqlHighlightTest {
> >
> >    @Test
> >    void reproduceHighlightProblem() throws IOException, ParseException,
> InvalidTokenOffsetsException {
> >       String text = "doesn't";
> >       String field = "text";
> >       //NOK: se~, se~2 and any higher number
> >       //OK: sel~, s~, se~1
> >       String uQuery = "se~";
> >       int maxStartOffset = -1;
> >       Analyzer analyzer = new SimpleAnalyzer();
> >
> >       Path indexLocation = Path.of("temp",
> "reproduceHighlightProblem").toAbsolutePath();
> >       if (indexLocation.toFile().exists()) {
> >          FileUtils.deleteDirectory(indexLocation.toFile());
> >       }
> >       Directory indexDir = FSDirectory.open(indexLocation);
> >
> >       //Create index
> >       IndexWriterConfig dimsIndexWriterConfig = new
> IndexWriterConfig(analyzer);
> >
>  dimsIndexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
> >       IndexWriter idxWriter = new IndexWriter(indexDir,
> dimsIndexWriterConfig);
> >       //add doc
> >       Document doc = new Document();
> >       doc.add(new TextField(field, text, Field.Store.NO));
> >       idxWriter.addDocument(doc);
> >       //commit
> >       idxWriter.commit();
> >       idxWriter.close();
> >
> >       //search & highlight
> >       Query query = new QueryParser(field, analyzer).parse(uQuery);
> >       Highlighter highlighter = new Highlighter(new
> SimpleHTMLFormatter(), new QueryScorer(query));
> >       TokenStream tokenStream = TokenSources.getTokenStream(field, null,
> text, analyzer, maxStartOffset);
> >       String highlighted = highlighter.getBestFragment(tokenStream,
> text);
> >       System.out.println(highlighted);
> >    }
> > }
> >
> >
> > Could you please confirm whether it's a bug in Lucene or whether we do
> something that is not allowed?
> >
> > Thanks a lot!
> > Best,
> > Juraj+
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to