On 4/12/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
> docfreqs (idfs) do not take into account deleted docs.
> This is more of an engineering tradeoff rather than a feature.
> If we could cheaply and easily update idfs when documents are deleted
> from an index, we would.

Wow.  So is it fair to say that the stored IDF is really the
cumulative IDF for all the documents that have ever been in the index
since it was last optimized?

Not quite... all documents that are marked as deleted, but haven't
actually been removed from the index.  Adding new documents sometimes
causes segments to me merged, and the resulting new segment will have
no deleted docs.

The difference between IndexReader.maxDoc() and numDocs() tells you
how many documents have been marked for deletion but still take up
space in the index.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to