I set maxBufDocs=2 so that I get a segment flushed, and indeed after delete I see _0.del.
So I guess this is just docs inconsistency. I'll clarify FlushPolicy docs. Shai On Thu, Aug 1, 2013 at 6:24 PM, Shai Erera <ser...@gmail.com> wrote: > > I think the doc is correct > > Wait, one of the docs is wrong. I guess according to what you write, it's > FlushPolicy, as a new segment is not flushed per this setting? > Or perhaps they should be clarified that the deletes are flushed == > applied on existing segments? > > I disabled reader pooling and I still don't see .del files. But I think > that's explained due to there are no segments in the index yet. > All documents are still in the RAM buffer, and according to what you > write, I shouldn't see any segment cause of delTerms? > > Shai > > > On Thu, Aug 1, 2013 at 5:40 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> First off, it's bad that you don't see .del files when >> conf.setMaxBufferedDeleteTerms is 1. >> >> But, it could be that newIndexWriterConfig turned on readerPooling >> which would mean the deletes are held in the SegmentReader and not >> flushed to disk. Can you make sure that's off? >> >> Second off, I think the doc is correct: a segment will not be flushed; >> rather, new .del files should appear against older segments. >> >> And yes, if RAM usage of the buffered del Term/Query s is too high, >> then a segment is flushed along with the deletes being applied >> (creating the .del files). >> >> I think buffered delete Querys are not counted towards >> setMaxBufferedDeleteTerms; so they are only flushed by RAM usage >> (rough rough estimate) or by other ops (merging, NRT reopen, commit, >> etc.). >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Aug 1, 2013 at 9:03 AM, Shai Erera <ser...@gmail.com> wrote: >> > Hi >> > >> > I'm a little confused about FlushPolicy and >> > IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy >> jdocs >> > say: >> > >> > * Segments are traditionally flushed by: >> > * <ul> >> > * <li>RAM consumption - configured via >> > ... >> > * <li>Number of buffered delete terms/queries - configured via >> > * {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}</li> >> > * </ul> >> > >> > Yet IWC.setMaxBufDelTerm says: >> > >> > NOTE: This setting won't trigger a segment flush. >> > >> > And FlushByRamOrCountPolicy says: >> > >> > * <li>{@link #onDelete(DocumentsWriterFlushControl, >> > DocumentsWriterPerThreadPool.ThreadState)} - flushes >> > * based on the global number of buffered delete terms iff >> > * {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled</li> >> > >> > Confused, I wrote a short unit test: >> > >> > public void testMaxBufDelTerm() throws Exception { >> > Directory dir = new RAMDirectory(); >> > IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT, >> new >> > MockAnalyzer(random())); >> > conf.setMaxBufferedDeleteTerms(1); >> > conf.setMaxBufferedDocs(10); >> > conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH); >> > conf.setInfoStream(new PrintStreamInfoStream(System.out)); >> > IndexWriter writer = new IndexWriter(dir, conf ); >> > int numDocs = 4; >> > for (int i = 0; i < numDocs; i++) { >> > Document doc = new Document(); >> > doc.add(new StringField("id", "doc-" + i, Store.NO)); >> > writer.addDocument(doc); >> > } >> > >> > System.out.println("before delete"); >> > for (String f : dir.listAll()) System.out.println(f); >> > >> > writer.deleteDocuments(new Term("id", "doc-0")); >> > writer.deleteDocuments(new Term("id", "doc-1")); >> > >> > System.out.println("\nafter delete"); >> > for (String f : dir.listAll()) System.out.println(f); >> > >> > writer.close(); >> > dir.close(); >> > } >> > >> > When InfoStream is turned on, I can see messages regarding terms >> flushing >> > (vs if I comment the .setMaxBufDelTerm line), so I know this settings >> takes >> > effect. >> > Yet both before and after the delete operations, the dir.list() returns >> only >> > the fdx and fdt files. >> > >> > So is this a bug that a segment isn't flushed? If not (and I'm ok with >> > that), is it a documentation inconsistency? >> > Strangely, I think, if the delTerms RAM accounting exhausts >> max-RAM-buffer >> > size, a new segment will be deleted? >> > >> > Slightly unrelated to FlushPolicy, but do I understand correctly that >> > maxBufDelTerm does not apply to delete-by-query operations? >> > BufferedDeletes doesn't increment any counter on addQuery(), so is it >> > correct to assume that if I only delete-by-query, this setting has no >> > effect? >> > And the delete queries are buffered until the next segment is flushed >> due to >> > other operations (constraints, commit, NRT-reopen)? >> > >> > Shai >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >