[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014538#comment-13014538
 ] 

Simon Willnauer commented on LUCENE-2573:
-----------------------------------------

bq. Thanks, Simon, for running the benchmarks! Good results overall, even 
though it's puzzling why flushing would be CPU intensive.
well during flush we are encoding lots of VInts thats making it cpu intensive.

I actually run the benchmark through a profiler and found out what the problem 
was with my benchmarks.
When I indexed with DWPT my HDD was soo busy flushing segments concurrently 
that the read performance suffered and my indexing threads blocked on the line 
doc file where I read the records from. This explains the large amounts of 
spikes towards 0 doc/sec. The profiler also showed that we are waiting on 
ThreadState#lock() constantly with at least 3 threads. I changed the current 
behavior of the threadpool to not clear the thread bindings when I replace a 
DWPT for flushing an voila! we have comparable peak ingest rate. 

!http://people.apache.org/~simonw/DocumentsWriterPerThread_dps_01.png! 

Note the difference DWPT indexes the documents in 6 min 15 seconds!

!http://people.apache.org/~simonw/Trunk_dps_01.png! 

Here we have 13 min 40 seconds! NICE!

!http://people.apache.org/~simonw/DocumentsWriterPerThread_flush_01.png! 


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to