[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch next iterations. * added JavaDoc to all classes and interfaces. * fixed the possible hot loop in DWPTThreadPool * changed stalling logic to block if more flushing DWPT are around than active ones. * check IWC setting on init and listen to live changes for those who have not been disabled. * made the hard per thread RAM limit configurable and added DEFAULT_RAM_PER_THREAD_HARD_LIMIT set to 1945MB * renamed DefaultFP --> FlushByRAMOrCounts * added setFlushDeletes to DWFlushControl that is checked each time we add a delete and if set we flush the global deletes. this seems somewhat close though. its time to benchmark it again. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy. * I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending. * DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env. * DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is > 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable. * FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update. * I removed FlushControl from IW * added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though. * For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no? * byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta * FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit & @Ignore to those tests. * this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged. * Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads. over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch next iteration containing a large number of refactorings. * I moved all responsibilities related to flushing including synchronization into the DocsWriterSession and renamed it to DocumentsWriterFlushControl. * DWFC now only tracks active and flush bytes since the relict from my initial patch where pending memory was tracked is not needed anymore. * DWFC took over all synchronization so there is not synchronized (flushControl) {...} in DocumentsWriter anymore. Seem way cleaner too though. * Healthiness now blocks once we reach 2x maxMemory and SingleTierFlushPolicy uses 0.9 maxRam as low watermark and 2x low watermark as its HW to flush all threads. The multi tier one is still unchanged and flushes in linear steps from 0.9 to 1.10 x maxRam. We should actually test if this does better worse than the single tier FP. * FlushPolicy now has only a visit method and uses the IW.message to write to info stream. * ThreadState now holds a boolean flag that indicates if a flush is pending which is synced and written by DWFC. States[] is gone in DWFC. * FlushSpecification is gone and DWFC returns DWPT upon checkoutForFlush. Yet, I still track the mem for the flushing DWPT seperatly since the DWPT#bytesUsed() changes during flush and I don't want to rely on that this doesn't change. As a nice side-effect I can check if a checked out DWPT is passed to doAfterFlush and assert on that. next steps here are benchmarking and getting good defaults for the flush policies. I think we are close though. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch next iteration on this patch. I changed some naming issues and separated a ByRAMFlushPolicy as an abstract base class. This patch contains the original MultiTierFlushPolicy and a SingleTierFlushPolicy that only has a low and a high watermark. This policy tries to flush the biggest DWPT once LW is crossed and flushed all DWPT once HW is crossed. This patch also adds a "flush if stalled" control that hijacks indexing threads if the DW is stalled and there are still pending flushes. If so the incoming thread tries to check out a pending DWPT and flushes it before it adds the actual document. I didn't benchmark the more complex MultiTierFP vs. SingleTierFP yet. I hope I get to this soon. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch here is an updated patch that writes more stuff to the infostream including if we are unhealthy and block threads. I also fixed some issues with TestIndexWriterExceptions. Another idea we should follow IMO is to see if we can biggypack the indexing threads on commit / flushAll instead of waiting to be able to lock each DWPT and flush it sequentially. This should be fairly easy since we can simply mark them as flushPending and let incoming indexing thread do the flush in parallel. Depending on how we index and how big the DWPTs are this could give us another sizable gain. For instance if you index and frequently commit, lets say every 10k docs (so many folks do stuff like that) but keep on indexing we should see concurrency helping us a lot since commit is not blocking all incoming indexing threads. I think we should spinoff another issues once this is ready > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch Here is my first cut / current status on this issue. First of all I have a couple of failures related to deletes but they seem not to be related (directly) to this patch since I can reproduce them even without the patch. all of the failures are related to deletes in some way so I suspect that there is another issue for that, no? This patch implements a tiered flush strategy combined with a concurrent flush approach. * All decisions are based on a FlushPolicy which operates on a DocumentsWriterSession (does the ram tracking and housekeeping), once the flush policy encounters a transition to the next tier it marks the "largest" ram consuming thread as flushPending if we transition from a lower level and all threads if we transition from the upper watermark (level). DocumentsWriterSession shifts the memory of a pending thread to a new memory "level" (pendingBytes) and marks the thread as pending. * Once FlushPolicy#findFlushes(..) returns the caller tries to check if itself needs to flush and if so it "checks-out" its DWPT and replaces it with a complete new instance. Releases the lock on the ThreadState and continues to flush the "checked-out" DWPT. After this is done or if the current DWPT doesn't need flushing the indexing thread checks if there are any other pending flushes and tries to (non-blocking) obtain their lock. It only tries to get the lock and only tries once since if the lock is taken another thread is already holding it and will see the flushPending once finished adding the document. This approach tries to utilize as much conccurrency as possible while flushing the DWPT and releaseing its ThreadState with an entirely new DWPT. Yet, this might also have problems especially if IO is slow and we filling up indexing RAM too fast. To prevent us from bloating up the memory too much I introduced a notation of "healtiness" which operates on the net-bytes used in the DocumentsWriterSession (flushBytes + pendingBytes + activeBytes) -- (flushBytes - mem consumption of currently flushing DWPT, pendingBytes - mem consumption of marked as pending ThreadStates / DWPT, activeBytes mem consuption of the indexing DWPT). If net-bytes reach a certain threshold (2*maxRam currently) I stop incoming threads until the session becomes healty again. I run luceneutil with trunk vs. LUCENE-2573 indexing 300k wikipedia docs with 1GB MaxRamBuffer and 4 Threads. Searches on both indexes yield identical results (Phew!) Indexing time in ms look promising ||trunk||patch|| diff || |134129 ms|102932 ms|{color:green}23.25%{color}| This patch is still kind of rough and needs iterations so reviews and questions are very much welcome. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2573: Component/s: Index > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2573: - Attachment: LUCENE-2573.patch There was a small bug in the choice of the max DWPT, in that all DWPTs, including ones that were scheduled to flush were being compared against the current DWPT (ie the one being examined for possible flushing). > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2573: - Attachment: LUCENE-2573.patch * perDocAllocator is removed from DocumentsWriterRAMAllocator * getByteBlock and getIntBlock always increments the numBytesUsed The test that simply prints out debugging messages looks better. I need to figure out unit tests. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2573: - Attachment: LUCENE-2573.patch Here's a first cut... The TestDWPTFlushByRAM doesn't do much at this point. It adds documents in 2 threads, and prints the RAM usage to stdout. It more or less shows the tiered flushing working. I don't think we're tracking all of the RAM usage yet, maybe just the terms? I need to review. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org