FastVectorHighlighter: IDF-weighted terms for ordered fragments 
----------------------------------------------------------------

                 Key: LUCENE-3440
                 URL: https://issues.apache.org/jira/browse/LUCENE-3440
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/highlighter
    Affects Versions: 3.5
            Reporter: S.L.
            Priority: Minor
             Fix For: 3.5


The FastVectorHighlighter uses for every term found in a fragment an equal 
weight, which causes a higher ranking for fragments with a high number of words 
or, in the worst case, a high number of very common words than fragments that 
contains *all* of the terms used in the original query. 

This patch provides ordered fragments with IDF-weighted terms: 

total weight = total weight + IDF for unique term per fragment * boost of 
query; 

The ranking-formular should be the same, or at least similar, to that one used 
in org.apache.lucene.search.highlight.QueryTermScorer.

The patch is simple, but it works for us. 

Some ideas:
- A better approach would be moving the whole fragments-scoring into a separate 
class.
- Switch scoring via parameter 
- Exact phrases should be given a even better score, regardless if a 
phrase-query was executed or not
- edismax/dismax-parameters pf, ps and pf^boost should be observed and 
corresponding fragments should be ranked higher 







--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to