[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785840#action_12785840
 ] 

Joaquin Perez-Iglesias commented on LUCENE-2091:
------------------------------------------------

Yes sorry.

Basically what we are trying is to constraint the effect of the raw frequency 
(saturate the frequency). 
In Lucene this is carried out with the root square of the frequency, another 
classical approach
is to use the log. With both approaches we avoid giving a linear 'importance' 
to the frequency.

BM25 is a bit tricky, it parametrises the 'saturation' of the frequency with a 
parameter k1, with the
equation weight(t)/(weight(t)+k1). Usually k1 is fixed to 2, but it can be 
fixed by collection.

> Add BM25 Scoring to Lucene
> --------------------------
>
>                 Key: LUCENE-2091
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2091
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Yuval Feinstein
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2091.patch, persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to