[ https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Høydahl updated LUCENE-6336: -------------------------------- Labels: lookup suggester (was: ) > AnalyzingInfixSuggester needs duplicate handling > ------------------------------------------------ > > Key: LUCENE-6336 > URL: https://issues.apache.org/jira/browse/LUCENE-6336 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 4.10.3, 5.0 > Reporter: Jan Høydahl > Labels: lookup, suggester > Attachments: LUCENE-6336.patch > > > Spinoff from LUCENE-5833 but else unrelated. > Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and > stores payload and score together with the suggest text. > I did some testing with Solr, producing the DocumentDictionary from an index > with multiple documents containing the same text, but with random weights > between 0-100. Then I got duplicate identical suggestions sorted by weight: > {code} > { > "suggest":{"languages":{ > "engl":{ > "numFound":101, > "suggestions":[{ > "term":"<b>Engl</b>ish", > "weight":100, > "payload":"0"}, > { > "term":"<b>Engl</b>ish", > "weight":99, > "payload":"0"}, > { > "term":"<b>Engl</b>ish", > "weight":98, > "payload":"0"}, > ---etc all the way down to 0--- > {code} > I also reproduced the same behavior in AnalyzingInfixSuggester directly. So > there is a need for some duplicate removal here, either while building the > local suggest index or during lookup. Only the highest weight suggestion for > a given term should be returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org