In an attempt to avoid doubling disk usage when adding new fields to all
existing documents, I added a call to IndexWriter::expungeDeletes. Then my
colleague pointed out that Lucene will rewrite the potentially large segment
files each time that method is called.
reader = writer.getReader();
for (int i=0; i<n; i++) {
Term idTerm = new Term("id", i);
TermDocs termDocs = reader.termDocs(idTerm);
if (termDocs != null && termDocs.next()) {
Document doc = reader.document(termDocs.doc());
doc.add(myfield, value);
writer.updateDocument(idTerm, doc);
//writer.expungeDeletes(true); // BAD: rewrites segment files each time
}
}
reader.close();
writer.commit();
writer.optimize(true);
writer.close();
The following Lucene FAQ response suggests that disk space from deleted
documents will be reclaimed. Is this true and is the savings worthwhile to
update an existing index (followed by optimizing out the deleted documents)
instead of simply creating a new index?
http://wiki.apache.org/lucene-java/LuceneFAQ#If_I_decide_not_to_optimize_the_index.2C_when_will_the_deleted_documents_actually_get_deleted.3F
Thanks for your help,
Justin
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]