[jira] [Updated] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.
[ https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3087: --- Fix Version/s: 4.0 3.2 highlighting exact phrase with overlapping tokens fails. Key: LUCENE-3087 URL: https://issues.apache.org/jira/browse/LUCENE-3087 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9.4, 3.1 Reporter: Pierre Gossé Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3087.patch Fields with overlapping token are not highlighted in search results when searching exact phrases, when using TermVector.WITH_OFFSET. The document builded in MemoryIndex for highlight does not preserve positions of tokens in this case. Overlapping tokens get flattened (position increment always set to 1), the spanquery used for searching relevant fragment will fail to identify the correct token sequence because the position shift. I corrected this by adding a position increment calculation in sub class StoredTokenStream. I added junit test covering this case. I used the eclipse codestyle from trunk, but style add quite a few format differences between repository and working copy files. I tried to reduce them, but some linewrapping rules still doesn't match. Correction patch joined -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.
[ https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Gossé updated LUCENE-3087: - Attachment: LUCENE-3087.patch correction patch with junit tests highlighting exact phrase with overlapping tokens fails. Key: LUCENE-3087 URL: https://issues.apache.org/jira/browse/LUCENE-3087 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9.4, 3.1 Reporter: Pierre Gossé Priority: Minor Attachments: LUCENE-3087.patch Fields with overlapping token are not highlighted in search results when searching exact phrases, when using TermVector.WITH_OFFSET. The document builded in MemoryIndex for highlight does not preserve positions of tokens in this case. Overlapping tokens get flattened (position increment always set to 1), the spanquery used for searching relevant fragment will fail to identify the correct token sequence because the position shift. I corrected this by adding a position increment calculation in sub class StoredTokenStream. I added junit test covering this case. I used the eclipse codestyle from trunk, but style add quite a few format differences between repository and working copy files. I tried to reduce them, but some linewrapping rules still doesn't match. Correction patch joined -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org