[jira] [Commented] (LUCENE-8344) TokenStreamToAutomaton doesn't ignore trailing posInc when preservePositionIncrements=false

David Smiley (JIRA) Fri, 08 Jun 2018 15:30:09 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506649#comment-16506649
 ]


David Smiley commented on LUCENE-8344:
--------------------------------------

The patch may be hard to review as a diff.  There are 3 tests now in 
TestPrefixCompletionQuery that are the same in data and queries but differ in 
expected results based on different CompletionAnalyzer settings.  I think it 
may be hard to maintain this as-such... it ought to be one so we don't have so 
much duplication and it may become easier to understand how the change in 
settings adjusts the expectations.  But hopefully you all think it's fine as is.

After some reflection, I figured that if preserveSep=false, then 
preservePositionIncrement is irrelevant, and so that's why we have one fewer 
test method than 2x2 would suggest.  This ought to throw an exception to the 
user.  Perhaps 3 factory methods would be better than the one constructor with 
two booleans?  There's likely an analogous situation with AnalyzingSuggester's 
long constructor.  Anyway this proposal doesn't belong in this issue.

Suggested CHANGES.txt notes:
* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not 
ignoring a trailing position increment when the preservePositionIncrement 
setting is false.  (David Smiley, Jim Ferenczi)

Upgrading _(a new section)_
*  LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester 
subclass, and if you explicitly use the preservePositionIncrements=false 
setting (not the default), then you ought to rebuild your suggester index.  If 
you don't, queries or indexed data with trailing position gaps (e.g. stop 
words) may not work correctly.

> TokenStreamToAutomaton doesn't ignore trailing posInc when 
> preservePositionIncrements=false
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8344
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8344
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/suggest
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8344.patch, LUCENE-8344.patch, LUCENE-8344.patch
>
>
> TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester 
> (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the 
> SolrTextTagger.  It has a setting {{preservePositionIncrements}} defaulting 
> to true.  If it's set to false (e.g. to ignore stopwords) and if there is a 
> _trailing_ position increment greater than 1, TS2A will _still_ add position 
> increments (holes) into the automata even though it was configured not to.
> I'm filing this issue separate from LUCENE-8332 where I first found it.  The 
> fix is very simple but I'm concerned about back-compat ramifications so I'm 
> filing it separately.  I'll attach a patch to show the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8344) TokenStreamToAutomaton doesn't ignore trailing posInc when preservePositionIncrements=false

Reply via email to