[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-2573:
------------------------------------
Attachment: LUCENE-2573.patch
here is my current state on this issue. I did't add all JDocs needed (by far)
and I will wait until we settled on the API for FlushPolicy.
* I removed the complex TieredFlushPolicy entirely and added one
DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT
pending.
* DW will stall threads if we reach 2 x maxNetRam which is retrieved from
FlushPolicy so folks can lower that depending on their env.
* DWFlushControl checks if a single DWPT grows too large and sets it forcefully
pending once its ram consumption is > 1.9 GB. That should be enough buffer to
not reach the 2048MB limit. We should consider making this configurable.
* FlushPolicy has now three methods onInsert, onUpdate and onDelete while
DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base
class just calls those on an update.
* I removed FlushControl from IW
* added documentation on IWC for FlushPolicy and removed the jdocs for the RAM
limit. I think we should add some lines about how RAM is now used and that
users should balance the RAM with the number of threads they are using. Will do
that later on though.
* For testing I added a ThrottledIndexOutput that makes flushing slow so I can
test if we are stalled and / or blocked. This is passed to
MockDirectoryWrapper. Its currently under util but it rather should go under
store, no?
* byte consumption is now committed before FlushPolicy is called since we don't
have the multitier flush which required that to reliably proceed across tier
boundaries (not required but it was easier to test really). So FP doesn't need
to take care of the delta
* FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered
delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which
causes some failures though. I added //nocommit & @Ignore to those tests.
* this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy
which I couldn't figure out why it is failing but it could be due to an old
version of LUCENE-2881 on this branch. I will see if it still fails once we
merged.
* Healthiness now doesn't stall if we are not flushing on RAM consumption to
ensure we don't lock in threads.
over all this seems much closer now. I will start writing jdocs. Flush on
buffered delete terms might need some tests and I should also write a more
reliable test for Healthiness... current it relies on that the
ThrottledIndexOutput is slowing down indexing enough to block which might not
be true all the time. It didn't fail yet.
> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a
> tiered approach:
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values
> explicitly using total values (e.g. low water mark at 120MB, high water mark
> at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
> config method and use something like 90% and 110% for the water marks?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]