[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013974#comment-13013974
 ] 

Simon Willnauer commented on LUCENE-2573:
-----------------------------------------

I run a couple of benchmarks with interesting results the graph below show 
documents per second for the RT branch with DWPT yielding a very good IO/CPU 
utilization and overall throughput is much better than trunks.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png! 
Yet, when we look at trunk the peak performance is much better on trunk than on 
DWPT. The reason for that I think is that we flush concurrently which takes at 
most one thread out of the loop, those are the little drops in docs/sec. This 
does not yet explain the reason for the constantly lower max indexing rate, I 
suspect that this is at least influenced due to the fact that flushing is very 
very CPU intensive. At the same time CMS might kick in way more often since we 
are writing more segments which are also smaller compared to trunk. Eventually, 
I need to run a profiler and see what is going on.
!http://people.apache.org/~simonw/Trunk_dps.png! 

Interesting is that beside the nice CPU utilization we also have an nearly 
perfect IO utilization. The graph below shows that we are consistently using IO 
to flush segments. the width of the bars show the time it took to flush a 
single DWPT, there is almost no overlap.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png! 

Overall those are super results! Good job everybody!

simon

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to