Re: Getting the document number (with IndexReader)

Chun Wei Ho Thu, 26 Jan 2006 00:47:55 -0800

Hi,

Thanks for the help, just a few more questions:


On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
> On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
> > I am attempting to prune an index by getting each document in turn and
> > then checking/deleting it:
> >
> > IndexReader ir = IndexReader.open(path);
> > for(int i=0;i<ir.numDocs();i++) {
> >       Document doc = ir.document(i);
> >       if(thisDocShouldBeDeleted(doc)) {
> >               ir.delete(docNum); // <- I need the docNum for doc.
> >       }
> > }
> >
> > How do I get the docNum for IndexReader.delete() function in the above
> > case? Is there a API function I am missing? I am working with a merged
>
> The document number is the variable i in this case.
If the document number is the variable i (enumerated from numDocs()),
what's the difference between numDocs() and maxDoc() in this case? I
was previously under the impression that the internal docNum might be
different to the counter.

> > index over different segments so the docNum might not be in running
> > sequence with the counter i.
> > In general, is there a better way to do this sort of thing?
>
> This code:
>
>         Document doc = ir.document(i);
>
> normally retrieves all the stored fields of the document and that is
> quite costly. In case you know that the document(s) to be deleted
> match(es) a Term, it's better to use IndexReader.delete(Term).

I'm doing something akin to a rangeQuery, where I delete documents
within a certain range (in addition to other criteria). Is it better
to do a query on the range, mark all the docNums getting them with
Hits.id(), and then retrieve docs and test for deletion according to
that?

Thanks for the help

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Getting the document number (with IndexReader)

Reply via email to