On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller <markrmil...@gmail.com> wrote:
> Okay - my fault - I'm not really talking in terms of Lucene. Though even > there I consider it possible. You'd just have to like, rewrite it :) And > it would likely be pretty slow. > Rewrite it how? When you index the very first document, the docFreq of all terms is 1, out of numDocs = 1 docs in the corpus. Everybody's idf is the same. No matter how you normalize this, it'll be wrong, once you've indexed a million documents. This isn't a matter of Lucene architecture, it's a matter of idf being a query-time exactly available value (you can approximate it partway through indexing, but you don't know it at all when you start). -jake