In an attempt to avoid doubling disk usage when adding new fields to all existing documents, I added a call to IndexWriter::expungeDeletes. Then my colleague pointed out that Lucene will rewrite the potentially large segment files each time that method is called.
reader = writer.getReader(); for (int i=0; i<n; i++) { Term idTerm = new Term("id", i); TermDocs termDocs = reader.termDocs(idTerm); if (termDocs != null && termDocs.next()) { Document doc = reader.document(termDocs.doc()); doc.add(myfield, value); writer.updateDocument(idTerm, doc); //writer.expungeDeletes(true); // BAD: rewrites segment files each time } } reader.close(); writer.commit(); writer.optimize(true); writer.close(); The following Lucene FAQ response suggests that disk space from deleted documents will be reclaimed. Is this true and is the savings worthwhile to update an existing index (followed by optimizing out the deleted documents) instead of simply creating a new index? http://wiki.apache.org/lucene-java/LuceneFAQ#If_I_decide_not_to_optimize_the_index.2C_when_will_the_deleted_documents_actually_get_deleted.3F Thanks for your help, Justin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org