[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon Willnauer updated LUCENE-2573: ------------------------------------ Attachment: LUCENE-2573.patch here is an updated patch that writes more stuff to the infostream including if we are unhealthy and block threads. I also fixed some issues with TestIndexWriterExceptions. Another idea we should follow IMO is to see if we can biggypack the indexing threads on commit / flushAll instead of waiting to be able to lock each DWPT and flush it sequentially. This should be fairly easy since we can simply mark them as flushPending and let incoming indexing thread do the flush in parallel. Depending on how we index and how big the DWPTs are this could give us another sizable gain. For instance if you index and frequently commit, lets say every 10k docs (so many folks do stuff like that) but keep on indexing we should see concurrency helping us a lot since commit is not blocking all incoming indexing threads. I think we should spinoff another issues once this is ready > Tiered flushing of DWPTs by RAM with low/high water marks > --------------------------------------------------------- > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Simon Willnauer > Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org