[
https://issues.apache.org/jira/browse/LUCENE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500766#comment-16500766
]
Jim Ferenczi commented on LUCENE-8344:
--------------------------------------
{quote}
org.apache.lucene.search.suggest.document.TestPrefixCompletionQuery#testAnalyzerWithSepAndNoPreservePos
see "test trailing stopword with a new document"
{quote}
If you index with preservePositionIncrements=false you cannot match a query
that preserves the position increments and contains a stop word. This is
expected. "baz the" indexed with preservePositionIncrements=false cannot match
the query "baz the" if you preserve the position increments. However it should
work if you query "baz" with and without preserving the pos increment. This is
why I said that the completion field (and all the related queries) should be
fine with this change. It works without reindexing.
{quote}
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest#testStandard
see the "round trip" test
With BUG==true: fails (bad for back-compat)
With BUG==false: passes (therefore a reindex fixes)
{quote}
This one is more tricky because it tries to find exact match first so the
indexed version and the query version should be the same otherwise the
assertion line 789 of the AnalyzingSuggester fails. We can probably fix the
discrepancy by adding a BWC layer that removes the trailing POS_SEP of the
indexed version when sameSurfaceForm is called and preservePosInc is false ?
WDYT ?
This would remove the need to rebuild the FST on a version that contains the
fix.
> TokenStreamToAutomaton doesn't ignore trailing posInc when
> preservePositionIncrements=false
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-8344
> URL: https://issues.apache.org/jira/browse/LUCENE-8344
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/suggest
> Reporter: David Smiley
> Priority: Major
> Attachments: LUCENE-8344.patch, LUCENE-8344.patch
>
>
> TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester
> (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the
> SolrTextTagger. It has a setting {{preservePositionIncrements}} defaulting
> to true. If it's set to false (e.g. to ignore stopwords) and if there is a
> _trailing_ position increment greater than 1, TS2A will _still_ add position
> increments (holes) into the automata even though it was configured not to.
> I'm filing this issue separate from LUCENE-8332 where I first found it. The
> fix is very simple but I'm concerned about back-compat ramifications so I'm
> filing it separately. I'll attach a patch to show the problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]