[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

Simon Willnauer (JIRA) Tue, 08 Mar 2011 03:40:23 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simon Willnauer updated LUCENE-2573:
------------------------------------

    Attachment: LUCENE-2573.patch

here is an updated patch that writes more stuff to the infostream including if 
we are unhealthy and block threads.
I also fixed some issues with TestIndexWriterExceptions.

Another idea we should follow IMO is to see if we can biggypack the indexing 
threads on commit / flushAll instead of waiting to be able to lock each DWPT 
and flush it sequentially. This should be fairly easy since we can simply mark 
them as flushPending and let incoming indexing thread do the flush in parallel. 
Depending on how we index and how big the DWPTs are this could give us another 
sizable gain. For instance if  you index and frequently commit, lets say every 
10k docs (so many folks do stuff like that) but keep on indexing we should see 
concurrency helping us a lot since commit is not blocking all incoming indexing 
threads. I think we should spinoff another issues once this is ready

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

Reply via email to