On Thursday 26 January 2006 09:47, Chun Wei Ho wrote: > Hi, > > Thanks for the help, just a few more questions: > > On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: > > > I am attempting to prune an index by getting each document in turn and > > > then checking/deleting it: > > > > > > IndexReader ir = IndexReader.open(path); > > > for(int i=0;i<ir.numDocs();i++) { > > > Document doc = ir.document(i); > > > if(thisDocShouldBeDeleted(doc)) { > > > ir.delete(docNum); // <- I need the docNum for doc. > > > } > > > } > > > > > > How do I get the docNum for IndexReader.delete() function in the above > > > case? Is there a API function I am missing? I am working with a merged > > > > The document number is the variable i in this case. > If the document number is the variable i (enumerated from numDocs()), > what's the difference between numDocs() and maxDoc() in this case? I > was previously under the impression that the internal docNum might be > different to the counter.
Iirc, the difference between maxDoc() + 1 and numDocs() is the number of deleted documents. Check the javadocs to be sure. > > > > index over different segments so the docNum might not be in running > > > sequence with the counter i. > > > In general, is there a better way to do this sort of thing? > > > > This code: > > > > Document doc = ir.document(i); > > > > normally retrieves all the stored fields of the document and that is > > quite costly. In case you know that the document(s) to be deleted > > match(es) a Term, it's better to use IndexReader.delete(Term). > > I'm doing something akin to a rangeQuery, where I delete documents > within a certain range (in addition to other criteria). Is it better > to do a query on the range, mark all the docNums getting them with > Hits.id(), and then retrieve docs and test for deletion according to > that? In that case it is faster to use the Terms generated inside the range query and then use these on IndexReader.delete(Term). To generate the terms have a look at the source code of the rewrite() method of RangeQuery here: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/ Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]