[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2959: -------------------------------- Attachment: LUCENE-2959_nocommits.patch patch removing all nocommits for the fake IDF/phrase issue, i thought it best not to "fake" statistics to SimilarityBase, since the whole point is to make it simpler for implementing/testing ranking models. instead it sums scores across terms (kinda like boolean query) for DFR P and D, I don't think there are really any great practical ways out of the fundamental problem. I added notes to both of these. i think the workaround for dirichlet is fine, i looked around and found another implementation of this smoothing by hiemstra and it had the same workaround (http://mirex.sourceforge.net / trec.nist.gov/pubs/trec19/papers/univ.twente.web.rev.pdf) all the other similarities seem to work fine being randomly swapped into lucene's tests. > [GSoC] Implementing State of the Art Ranking for Lucene > ------------------------------------------------------- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: core/query/scoring, general/javadocs, modules/examples > Reporter: David Mark Nemeskey > Assignee: Robert Muir > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: flexscoring branch > > Attachments: LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, > implementation_plan.pdf, proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the > architecture is > tailored specically to VSM, which makes the addition of new ranking functions > a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to > implement a > query architecture with pluggable ranking functions. > The wiki page for the project can be found at > http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org