On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Mark Miller wrote:
> >
> > it looks expensive to me to do both
> > of them properly.
> Okay - I guess that somewhat makes sense - you can calculate the
> magnitude of the doc vectors at index time. How is that impossible with
> incremental indexing though? Isn't it just expensive? Seems somewhat
> expensive in the non incremental case as well - your just eating it at
> index time rather than query time - though the same could be done for
> incremental? The information is all there in either case.
>

The expense, if you have the idfs of all terms in the vocabulary (keep them
in the form of idf^2 for efficiency at index time), is pretty trivial, isn't
it?  If
you have a document with 1000 terms, it's maybe 3000 floating point
operations, all CPU actions, in memory, no disk seeks.

What it does require, is knowing, even when you have no documents yet
on disk, what the idf of terms in the first few documents are.  Where do
you know this, in Lucene, if you haven't externalized some notion of idf?

  -jake


>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to