[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

Koji Sekiguchi (Commented) (JIRA) Mon, 17 Oct 2011 04:26:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128795#comment-13128795
 ]


Koji Sekiguchi commented on LUCENE-3440:
----------------------------------------

Hi sebastian,

{quote}
Frankly, I didn't run the tests because I thought the changes provided with the 
last patch shouldn't affect the original behavior.
I'll have a look into it. But this may take some time, due to the fact that I 
have no knowledge about the test-framework. 
{quote}

Ok, no problem. I'll see the test case (hopefully next week or so). But can you 
take care of the following to go forward?

{quote}
Ah, sebastian, I think you needed to check "Grant license to ASF for inclusion 
in ASF works" when you attach your patch. Can you remove the latest patches and 
reattach them with that flag? Thanks!
{quote}

                
> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3440
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3440
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 3.5, 4.0
>            Reporter: sebastian L.
>            Priority: Minor
>              Labels: FastVectorHighlighter
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3.5-SNAPSHOT-3440-8.patch, LUCENE-3440.patch, 
> LUCENE-4.0-SNAPSHOT-3440-9.patch, weight-vs-boost_table01.html, 
> weight-vs-boost_table02.html
>
>
> The FastVectorHighlighter uses for every term found in a fragment an equal 
> weight, which causes a higher ranking for fragments with a high number of 
> words or, in the worst case, a high number of very common words than 
> fragments that contains *all* of the terms used in the original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of 
> query; 
> The ranking-formula should be the same, or at least similar, to that one used 
> in org.apache.lucene.search.highlight.QueryTermScorer.
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a 
> separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a 
> phrase-query was executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
> corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

Reply via email to