[ https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880422#comment-16880422 ]
Adrien Grand commented on LUCENE-4312: -------------------------------------- Recording position lengths in the index is the easy part of the problem in my opinion. I'm concerned that this will introduce significant complexity to phrase queries (they will require backtracking in order to deal with the case that a term exists twice at the same position with different position lengths), and even make sloppy phrase queries and their spans/intervals counterparts meaningless (as terms could be very distant according to the index only because there is one term in-between that has a multi-term synonym indexed). > Index format to store position length per position > -------------------------------------------------- > > Key: LUCENE-4312 > URL: https://issues.apache.org/jira/browse/LUCENE-4312 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: 6.0 > Reporter: Gang Luo > Priority: Minor > Labels: Suggestion > Original Estimate: 72h > Remaining Estimate: 72h > > Mike Mccandless said:TokenStreams are actually graphs. > Indexer ignores PositionLengthAttribute.Need change the index format (and > Codec APIs) to store an additional int position length per position. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org