[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014016#comment-13014016
 ] 

Jason Rutherglen commented on LUCENE-2573:
------------------------------------------

bq. influenced due to the fact that flushing is very very CPU intensive

Do you think this is due mostly to the vint decoding?  We're not interleaving 
postings on flush with this patch so the CPU consumption should be somewhat 
lower.

bq. At the same time CMS might kick in way more often since we are writing more 
segments which are also smaller compared to trunk

This's probably the more likely case.  In general, we may be able to default to 
a higher overall RAM buffer size, and perhaps there won't be degradation in 
indexing performance like there is with trunk?  In the future with RT we could 
get fancy and selectively merge segments as we're flushing, if writing larger 
segments is important.  

I'd personally prefer to write out 1-2 GB segments, and limit the number of 
DWPTs to 2-3, mainly for servers that are concurrently indexing and searching 
(eg, the RT use case).  I think the current default number of thread states is 
a bit high.  

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to