[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013907#comment-13013907 ]
David Mark Nemeskey commented on LUCENE-2959: --------------------------------------------- Robert: thanks for all the info! It's nice to see so much work has already been done. I plan to delve into it after the selection, and try to get other things out of the way until then, so that I can concentrate on GSoC during the summer. I think the main point would be to make the addition of a new ranking function as easy as possible. At least a prototype implementation should be very straightforward, even at the expense of performance. Then, if the new method provides good results, the developer can go on to the lower level to squeeze more juice out of it. It's hard for me to discuss new this without knowing the code, of course, but do you think it is possible? Even though I added a "Performance" section to my proposal (http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1), I see now that it's probably more important than I believed it to be at first. I think I will follow your advice and concentrate on how to make BM25F fast. It may be a bit tougher nut to crack than DFR, as the latter has logarithms scattered all over it. However, the first thing that comes to mind is that the tf-BM25 curve becomes almost flat very quickly (less so for a high k1 value, though). So it may be possible to pre-compute a tf map or array for a query. > [GSoC] Implementing State of the Art Ranking for Lucene > ------------------------------------------------------- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: Examples, Javadocs, Query/Scoring > Reporter: David Mark Nemeskey > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, > proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the > architecture is > tailored specically to VSM, which makes the addition of new ranking functions > a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to > implement a > query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org