Okay - my fault - I'm not really talking in terms of Lucene. Though even there I consider it possible. You'd just have to like, rewrite it :) And it would likely be pretty slow.
Jake Mannix wrote: > > > On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller <markrmil...@gmail.com > <mailto:markrmil...@gmail.com>> wrote: > > Mark Miller wrote: > > > > it looks expensive to me to do both > > of them properly. > Okay - I guess that somewhat makes sense - you can calculate the > magnitude of the doc vectors at index time. How is that impossible > with > incremental indexing though? Isn't it just expensive? Seems somewhat > expensive in the non incremental case as well - your just eating it at > index time rather than query time - though the same could be done for > incremental? The information is all there in either case. > > > The expense, if you have the idfs of all terms in the vocabulary (keep > them > in the form of idf^2 for efficiency at index time), is pretty trivial, > isn't it? If > you have a document with 1000 terms, it's maybe 3000 floating point > operations, all CPU actions, in memory, no disk seeks. > > What it does require, is knowing, even when you have no documents yet > on disk, what the idf of terms in the first few documents are. Where do > you know this, in Lucene, if you haven't externalized some notion of idf? > > -jake > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > For additional commands, e-mail: java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org