[ https://issues.apache.org/jira/browse/LUCENE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600918#comment-13600918 ]
Luca Cavanna commented on LUCENE-4825: -------------------------------------- Hey Robert, sorry but I don't quite understand why it would become an orange? :) I mean, the PostingsHighlighter does (among others) two great things: 1) reads offsets from the postings list, as its name says 2) summarizes the content giving nice sentences as output I think the two above features are a great improvement and pretty much what everybody would like to have! I'm proposing to add support for positional queries, as a third optional feature. We would need to read the spans from the positional queries in order to highlight only the proper terms, otherwise the output is wrong from a user perspective. Would this make it that slower? I don't mean to reanalyze the text... Don't get me wrong you must be right but I would like to understand more. You're saying that instead of adding 3) to 2) and 1) we should have another highlighter that does 1) 2) and 3)? > PostingsHighlighter support for positional queries > -------------------------------------------------- > > Key: LUCENE-4825 > URL: https://issues.apache.org/jira/browse/LUCENE-4825 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Affects Versions: 4.2 > Reporter: Luca Cavanna > > I've been playing around with the brand new PostingsHighlighter. I'm really > happy with the result in terms of quality of the snippets and performance. > On the other hand, I noticed it doesn't support positional queries. If you > make a span query, for example, all the single terms will be highlighted, > even though they haven't contributed to the match. That reminds me of the > difference between the QueryTermScorer and the QueryScorer (using the > standard Highlighter). > I've been trying to adapt what the QueryScorer does, especially the > extraction of the query terms together with their positions (what > WeightedSpanTermExtractor does). Next step would be to take that information > into account within the formatter and highlight only the terms that actually > contributed to the match. I'm not quite ready yet with a patch to contribute > this back, but I certainly intend to do so. That's why I opened the issue and > in the meantime I would like to hear what you guys think about it and > discuss how best we can fix it. I think it would be a big improvement for > this new highlighter, which is already great! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org