[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

Michael McCandless (JIRA) Fri, 25 Mar 2011 11:28:45 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011333#comment-13011333
 ]


Michael McCandless commented on LUCENE-2573:
--------------------------------------------

Patch is looking better!  I love how simple DefaultFP is now :)

  * 1900 * 1024 * 1024 is actually 1.86 GB; maybe just change comment
    to 1900 MB?  Or we could really make the limit 1.9 GB (= 1945.6
    MB)

  * I think we should make the 1.9 GB changeable
    (setRAMPerThreadHardLimitMB?)

  * How come we lost 'assert !bufferedDeletesStream.any();' in
    IndexWriter.java?

  * Why default trackAllocations to true when ctor always sets it (in
    TermsHash.java)?

  * Can we not simply not invoke the FP if flushPending is already set
    for the given DWPT?  (So that every FP need not check that).

  * In DefaultFP.onDelete -- we shouldn't just return if numDocsInRAM
    is 0?  Ie an app could open IW, delete 2M terms, close, and we
    need to flush several times due to RAM usage or del term count...

  * Maybe rename DefaultFP --> FlushByRAMOrCounts?

  * Won't the new do/while loop added to ThreadAffinityDWThreadPool
    run hot, if minThreadState is constantly null...?  (Separately
    that source needs a header)

  * I love the ThrottledIndexOutput!

  * For the InterruptedException in ThrottledIndexOutput.sleep, we
    should rethrow w/ oal.util.ThreadInterruptedException (best
    practice... probably doesn't really matter here)

  * We should fix DefaultFlushPolicy to first pull the relevant config
    from IWC (eg maxBufferedDocs), then check if that config is -1 or
    not, etc., because IWC's config can be changed at any time (live)
    so we may read eg 10000 at first and then -1 the second time.

Maybe, for stalling, instead of triggering by max RAM, we can take
this simple approach: if the number of flushing DWPTs ever exceeds one
plus number of active DWPTs, then we stall (and resume once it's below
again).

This approach would then work for flush-by-docCount policies too, and
would still roughly equate to up to 2X RAM usage for flush-by.

It's really odd that TestPersistentSDP fails now... this should be
unrelated to the (admittedly, major) changes we're making here...

Hmm.... deletes are actually tricky, because somehow the FlushPolicy
needs access to the "global" deletes count (and also the to per-DWPT
deletes count).  If a given DWPT has 0 buffered docs, then indeed the
buffered deletes in its pool doesn't matter.  But, we do need to respect
the buffered deletes in the global pool...

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

Reply via email to