[ https://issues.apache.org/jira/browse/LUCENE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe resolved LUCENE-7231. -------------------------------- Resolution: Fixed Fix Version/s: 5.6 5.5.2 > Problem with NGramAnalyzer, PhraseQuery and Highlighter > ------------------------------------------------------- > > Key: LUCENE-7231 > URL: https://issues.apache.org/jira/browse/LUCENE-7231 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter > Affects Versions: 5.4.1 > Reporter: Eva Popenda > Assignee: Alan Woodward > Fix For: 6.1, 5.5.2, 5.6, 6.0.1 > > Attachments: LUCENE-7231.patch > > > Using the Highlighter with N-GramAnalyzer and PhraseQuery and searching for a > substring with length = N yields the following exception: > {noformat} > java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 > at > org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:40) > at > org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:56) > at > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) > {noformat} > Below is a JUnit-Test reproducing this behavior. In case of searching for a > string with more than N characters or using NGramPhraseQuery this problem > doesn't occur. > Why is it that more than 1 subSpans are required? > {code:java} > public class HighlighterTest { > @Rule > public final ExpectedException exception = ExpectedException.none(); > @Test > public void testHighlighterWithPhraseQueryThrowsException() throws > IOException, InvalidTokenOffsetsException { > final Analyzer analyzer = new NGramAnalyzer(4); > final String fieldName = "substring"; > final List<BytesRef> list = new ArrayList<>(); > list.add(new BytesRef("uchu")); > final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new > BytesRef[list.size()])); > final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); > final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b>", > "</b>"); > exception.expect(IllegalArgumentException.class); > exception.expectMessage("Less than 2 subSpans.size():1"); > final Highlighter highlighter = new > Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); > highlighter.setTextFragmenter(new SimpleFragmenter(100)); > final String fragment = highlighter.getBestFragment(analyzer, > fieldName, "Buchung"); > assertEquals("B<b>uchu</b>ng",fragment); > } > public final class NGramAnalyzer extends Analyzer { > private final int minNGram; > public NGramAnalyzer(final int minNGram) { > super(); > this.minNGram = minNGram; > } > @Override > protected TokenStreamComponents createComponents(final String fieldName) { > final Tokenizer source = new NGramTokenizer(minNGram, minNGram); > return new TokenStreamComponents(source); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org