Hi Juraj+, This indeed smells like a bug. FuzzyTermsEnum should never try to set a negative boost!
Could you open an issue and open a PR (or attach a patch) with your test case? Thank you for boiling this down. This part really made me chuckle: > When our text contains an apostrophe followed by a single character AND we our search query is composed of exactly two letters followed by proximity search AND we use highlighting, we get an exception: Mike McCandless http://blog.mikemccandless.com On Thu, Oct 1, 2020 at 12:48 PM Michael Sokolov <[email protected]> wrote: > I traced this to this block in FuzzyTermsEnum: > > if (ed == 0) { // exact match > boostAtt.setBoost(1.0F); > } else { > final int codePointCount = UnicodeUtil.codePointCount(term); > int minTermLength = Math.min(codePointCount, termLength); > > float similarity = 1.0f - (float) ed / (float) minTermLength; > boostAtt.setBoost(similarity); > } > > where in your test ed (edit distance) was 2 and minTermLength 1, > leading to negative boost. > > I don't really understand this code at all, but I wonder if it should > divide by maxTermLength instead of minTermLength? > > On Thu, Oct 1, 2020 at 9:54 AM Juraj Jurčo <[email protected]> wrote: > > > > Hi guys, > > we are trying to implement search and we have experienced a strange > situation. When our text contains an apostrophe followed by a single > character AND we our search query is composed of exactly two letters > followed by proximity search AND we use highlighting, we get an exception: > > > >> java.lang.IllegalArgumentException: boost must be a positive float, got > -1.0 > > > > > > It seems there is a problem at:FuzzyTermsEnum.java:271 (float similarity > = 1.0f - (float) ed / (float) minTermLength) when it reaches it with ed=2 > and it sets a negative boost. > > > > I was able to reproduce the error with following code: > > > > import java.io.IOException; > > import java.nio.file.Path; > > > > import org.apache.commons.io.FileUtils; > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.TokenStream; > > import org.apache.lucene.analysis.core.SimpleAnalyzer; > > import org.apache.lucene.document.Document; > > import org.apache.lucene.document.Field; > > import org.apache.lucene.document.TextField; > > import org.apache.lucene.index.IndexWriter; > > import org.apache.lucene.index.IndexWriterConfig; > > import org.apache.lucene.queryparser.classic.ParseException; > > import org.apache.lucene.queryparser.classic.QueryParser; > > import org.apache.lucene.search.Query; > > import org.apache.lucene.search.highlight.Highlighter; > > import org.apache.lucene.search.highlight.InvalidTokenOffsetsException; > > import org.apache.lucene.search.highlight.QueryScorer; > > import org.apache.lucene.search.highlight.SimpleHTMLFormatter; > > import org.apache.lucene.search.highlight.TokenSources; > > import org.apache.lucene.store.Directory; > > import org.apache.lucene.store.FSDirectory; > > import org.junit.jupiter.api.Test; > > > > class FindSqlHighlightTest { > > > > @Test > > void reproduceHighlightProblem() throws IOException, ParseException, > InvalidTokenOffsetsException { > > String text = "doesn't"; > > String field = "text"; > > //NOK: se~, se~2 and any higher number > > //OK: sel~, s~, se~1 > > String uQuery = "se~"; > > int maxStartOffset = -1; > > Analyzer analyzer = new SimpleAnalyzer(); > > > > Path indexLocation = Path.of("temp", > "reproduceHighlightProblem").toAbsolutePath(); > > if (indexLocation.toFile().exists()) { > > FileUtils.deleteDirectory(indexLocation.toFile()); > > } > > Directory indexDir = FSDirectory.open(indexLocation); > > > > //Create index > > IndexWriterConfig dimsIndexWriterConfig = new > IndexWriterConfig(analyzer); > > > dimsIndexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE); > > IndexWriter idxWriter = new IndexWriter(indexDir, > dimsIndexWriterConfig); > > //add doc > > Document doc = new Document(); > > doc.add(new TextField(field, text, Field.Store.NO)); > > idxWriter.addDocument(doc); > > //commit > > idxWriter.commit(); > > idxWriter.close(); > > > > //search & highlight > > Query query = new QueryParser(field, analyzer).parse(uQuery); > > Highlighter highlighter = new Highlighter(new > SimpleHTMLFormatter(), new QueryScorer(query)); > > TokenStream tokenStream = TokenSources.getTokenStream(field, null, > text, analyzer, maxStartOffset); > > String highlighted = highlighter.getBestFragment(tokenStream, > text); > > System.out.println(highlighted); > > } > > } > > > > > > Could you please confirm whether it's a bug in Lucene or whether we do > something that is not allowed? > > > > Thanks a lot! > > Best, > > Juraj+ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
