scampi opened a new pull request, #13315:
URL: https://github.com/apache/lucene/pull/13315

   - **test: add unit tests to reproduce the IndexOutOfBoundsException in 
DefaultPassageFormatter**
   - **doc: clarify javadoc of MatchesIterator**
   - **fix: sort passages by offset if positions are missing**
   
   Fix #12431
   
   ### Description
   
   `DefaultPassageFormatter` may cause an `IndexOutOfBoundsException` in case 
matches of a passage are out of order.
   This happens when term vectors are stored but not the positions.
   
   The tentative solution consists in ordering matches according to offsets in 
`DisjunctionMatchesIterator` in case positions don't exist.
   However, I wonder if it is correct to create such an iterator when positions 
are not available. Matches with no terms are explicitly removed, and the 
javadoc of `MATCH_WITH_NO_TERMS` mentions that it indicates a match with _no 
term positions_. Therefore, from that doc it shouldn't be possible to highlight 
text without positions. Is that javadoc correct ? Should it be updated ?
   
   
https://github.com/apache/lucene/blob/bc678ac67e32c55a27a4e8950c25144cc89cef66/lucene/core/src/java/org/apache/lucene/search/MatchesUtils.java#L67
   
   
https://github.com/apache/lucene/blob/bc678ac67e32c55a27a4e8950c25144cc89cef66/lucene/core/src/java/org/apache/lucene/search/MatchesUtils.java#L40-L44
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to