[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013907#comment-13013907
 ] 

David Mark Nemeskey commented on LUCENE-2959:
---------------------------------------------

Robert: thanks for all the info! It's nice to see so much work has already been 
done. I plan to delve into it after the selection, and try to get other things 
out of the way until then, so that I can concentrate on GSoC during the summer.

I think the main point would be to make the addition of a new ranking function 
as easy as possible. At least a prototype implementation should be very 
straightforward, even at the expense of performance. Then, if the new method 
provides good results, the developer can go on to the lower level to squeeze 
more juice out of it. It's hard for me to discuss new this without knowing the 
code, of course, but do you think it is possible?

Even though I added a "Performance" section to my proposal 
(http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1),
 I see now that it's probably more important than I believed it to be at first. 
I think I will follow your advice and concentrate on how to make BM25F fast. It 
may be a bit tougher nut to crack than DFR, as the latter has logarithms 
scattered all over it. However, the first thing that comes to mind is that the 
tf-BM25 curve becomes almost flat very quickly (less so for a high k1 value, 
though). So it may be possible to pre-compute a tf map or array for a query.

> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-2959
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2959
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Examples, Javadocs, Query/Scoring
>            Reporter: David Mark Nemeskey
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
> proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the 
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions 
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to 
> implement a
> query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to