[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon Willnauer updated LUCENE-2573: ------------------------------------ Attachment: LUCENE-2573.patch here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy. * I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending. * DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env. * DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is > 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable. * FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update. * I removed FlushControl from IW * added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though. * For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no? * byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta * FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit & @Ignore to those tests. * this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged. * Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads. over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet. > Tiered flushing of DWPTs by RAM with low/high water marks > --------------------------------------------------------- > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Simon Willnauer > Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org