Jim Ferenczi created LUCENE-7708:
------------------------------------

             Summary: Track PositionLengthAttribute abuse
                 Key: LUCENE-7708
                 URL: https://issues.apache.org/jira/browse/LUCENE-7708
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/queryparser, modules/analysis
            Reporter: Jim Ferenczi


Some token filters uses the position length attribute of the token stream to 
encode the number of terms they put in a single token. 
This breaks the query parsing because it creates disconnected graph. 
I've tracked down the abusive case to 2 candidates:
* ShingleFilter which sets the position length attribute to the length of the 
shingle.
* CJKBigramFilter which always sets the position length attribute to 2.

I don't think these filters should set the position length at all so the best 
would be to remove the attribute from these token filters but this could break 
BWC.
Though this is a serious bug since shingles and cjk bigram now produce invalid 
queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to