[
https://issues.apache.org/jira/browse/LUCENE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500848#comment-16500848
]
David Smiley commented on LUCENE-8344:
--------------------------------------
RE NRT Doc Suggester: "This is expected" – Okay I see what you mean. I guess
if any user (past/present/future) wants to use
preservePositionIncrements=false effectively then they need to be using
CompletionAnalyzer/CompletionTokenStream both at index _and_ query time. The
existing tests are not doing that – it is using the input analyzer at query
time. The particular two queries it did use in a test, "fo" and "foob" didn't
demonstrate something important this test should be testing for – position
increments (stopwords) in the _query_. Ditto for some similar test methods
here (pos and negative assertions). I'll try and improve this some.
RE AnalyzingSuggester: Hmmm. What if the "exactFirst" logic first phase
captured the "output2" lookup results in a place that could be examined by the
second pass? I think this would be more robust, and wouldn't need to even
invoke sameSurfaceForm in second phase. If the FST was built with the bug (7.3
or prior) then an exact match of a trailing stopword with this setting wouldn't
be recognized as an exact match, but I think that's a minor loss easily fixed
with reindexing?
> TokenStreamToAutomaton doesn't ignore trailing posInc when
> preservePositionIncrements=false
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-8344
> URL: https://issues.apache.org/jira/browse/LUCENE-8344
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/suggest
> Reporter: David Smiley
> Priority: Major
> Attachments: LUCENE-8344.patch, LUCENE-8344.patch
>
>
> TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester
> (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the
> SolrTextTagger. It has a setting {{preservePositionIncrements}} defaulting
> to true. If it's set to false (e.g. to ignore stopwords) and if there is a
> _trailing_ position increment greater than 1, TS2A will _still_ add position
> increments (holes) into the automata even though it was configured not to.
> I'm filing this issue separate from LUCENE-8332 where I first found it. The
> fix is very simple but I'm concerned about back-compat ramifications so I'm
> filing it separately. I'll attach a patch to show the problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]