My application for Lucene involves updating an existing index with a mixture of new and revised documents. From what I've been able to dicern from reading I'm going to have to delete the old versions of the revised documents before indexing them again. Since this indexing will probably take quite a while due to the number of new/revised documents I'll be adding and the large number of documents already in the index, I'm uncomfortable keeping an IndexReader and an IndexWriter open for long periods of time.

What I'm considering doing is reading the file with mulitple documents twice. One time I test to see if the document is in the index and delete it if it is with something like:

The "Reference" term is unique.

...
   while(String ref = getNextDocument() != null) {
     Term t = Term("Reference",ref);
     TermDocs td = indexReader.termDocs(t);
     if(td != null) {
       td.next();
       indexReader.delete(td.doc());
     }
   }

Or should I not bother to look for the term at all and do something like this?

   while(String ref = getNextDocument() != null) {
     Term t = Term("Reference",ref);
     indexReader.delete(t);
   }
Are either of these more efficient?

Then I would close the indexReader and go back and reread the file, indexing merrily away.

Should I be concerned about keeping both an indexReader and indexWriter open at the same time? I'll have other processes probably making searches during this time. I'm not concerned about the searches not finding the data I'm currently adding, I'm more concerned about locking those searches out.

A couple of valid assumptions. The reference term is unique in the index and there will be only one in the input file.

Thanks,
Jim.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to