[ https://issues.apache.org/jira/browse/LUCENE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500848#comment-16500848 ]
David Smiley commented on LUCENE-8344: -------------------------------------- RE NRT Doc Suggester: "This is expected" – Okay I see what you mean. I guess if any user (past/present/future) wants to use preservePositionIncrements=false effectively then they need to be using CompletionAnalyzer/CompletionTokenStream both at index _and_ query time. The existing tests are not doing that – it is using the input analyzer at query time. The particular two queries it did use in a test, "fo" and "foob" didn't demonstrate something important this test should be testing for – position increments (stopwords) in the _query_. Ditto for some similar test methods here (pos and negative assertions). I'll try and improve this some. RE AnalyzingSuggester: Hmmm. What if the "exactFirst" logic first phase captured the "output2" lookup results in a place that could be examined by the second pass? I think this would be more robust, and wouldn't need to even invoke sameSurfaceForm in second phase. If the FST was built with the bug (7.3 or prior) then an exact match of a trailing stopword with this setting wouldn't be recognized as an exact match, but I think that's a minor loss easily fixed with reindexing? > TokenStreamToAutomaton doesn't ignore trailing posInc when > preservePositionIncrements=false > ------------------------------------------------------------------------------------------- > > Key: LUCENE-8344 > URL: https://issues.apache.org/jira/browse/LUCENE-8344 > Project: Lucene - Core > Issue Type: Bug > Components: modules/suggest > Reporter: David Smiley > Priority: Major > Attachments: LUCENE-8344.patch, LUCENE-8344.patch > > > TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester > (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the > SolrTextTagger. It has a setting {{preservePositionIncrements}} defaulting > to true. If it's set to false (e.g. to ignore stopwords) and if there is a > _trailing_ position increment greater than 1, TS2A will _still_ add position > increments (holes) into the automata even though it was configured not to. > I'm filing this issue separate from LUCENE-8332 where I first found it. The > fix is very simple but I'm concerned about back-compat ramifications so I'm > filing it separately. I'll attach a patch to show the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org