[ https://issues.apache.org/jira/browse/LUCENE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506649#comment-16506649 ]
David Smiley commented on LUCENE-8344: -------------------------------------- The patch may be hard to review as a diff. There are 3 tests now in TestPrefixCompletionQuery that are the same in data and queries but differ in expected results based on different CompletionAnalyzer settings. I think it may be hard to maintain this as-such... it ought to be one so we don't have so much duplication and it may become easier to understand how the change in settings adjusts the expectations. But hopefully you all think it's fine as is. After some reflection, I figured that if preserveSep=false, then preservePositionIncrement is irrelevant, and so that's why we have one fewer test method than 2x2 would suggest. This ought to throw an exception to the user. Perhaps 3 factory methods would be better than the one constructor with two booleans? There's likely an analogous situation with AnalyzingSuggester's long constructor. Anyway this proposal doesn't belong in this issue. Suggested CHANGES.txt notes: * LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing position increment when the preservePositionIncrement setting is false. (David Smiley, Jim Ferenczi) Upgrading _(a new section)_ * LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you explicitly use the preservePositionIncrements=false setting (not the default), then you ought to rebuild your suggester index. If you don't, queries or indexed data with trailing position gaps (e.g. stop words) may not work correctly. > TokenStreamToAutomaton doesn't ignore trailing posInc when > preservePositionIncrements=false > ------------------------------------------------------------------------------------------- > > Key: LUCENE-8344 > URL: https://issues.apache.org/jira/browse/LUCENE-8344 > Project: Lucene - Core > Issue Type: Bug > Components: modules/suggest > Reporter: David Smiley > Priority: Major > Attachments: LUCENE-8344.patch, LUCENE-8344.patch, LUCENE-8344.patch > > > TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester > (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the > SolrTextTagger. It has a setting {{preservePositionIncrements}} defaulting > to true. If it's set to false (e.g. to ignore stopwords) and if there is a > _trailing_ position increment greater than 1, TS2A will _still_ add position > increments (holes) into the automata even though it was configured not to. > I'm filing this issue separate from LUCENE-8332 where I first found it. The > fix is very simple but I'm concerned about back-compat ramifications so I'm > filing it separately. I'll attach a patch to show the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org