[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880422#comment-16880422
 ] 

Adrien Grand commented on LUCENE-4312:
--------------------------------------

Recording position lengths in the index is the easy part of the problem in my 
opinion. I'm concerned that this will introduce significant complexity to 
phrase queries (they will require backtracking in order to deal with the case 
that a term exists twice at the same position with different position lengths), 
and even make sloppy phrase queries and their spans/intervals counterparts 
meaningless (as terms could be very distant according to the index only because 
there is one term in-between that has a multi-term synonym indexed). 

> Index format to store position length per position
> --------------------------------------------------
>
>                 Key: LUCENE-4312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4312
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 6.0
>            Reporter: Gang Luo
>            Priority: Minor
>              Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to