[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020206#comment-13020206
 ] 

Simon Willnauer commented on LUCENE-2571:
-

bq. Would you consider trying other MergePolicy objects on trunk? The 
BalancedSegment MP tries to avoid these long stoppages.

I think there is a misunderstanding on your side. The long stoppages on trunk 
are not due to merges at all. They are due to flushing the DocumentsWriter 
which essentially means stop the world. This is why we can not make any 
progress. Merges are NOT blocking indexing on trunk no matter which MP you use. 
The Balanced MP is rather suited for RT environments to make reopening the 
reader quicker. 

you should maybe look at this blog entry for a more complete explanation: 
http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020217#comment-13020217
 ] 

Earwin Burrfoot commented on LUCENE-2571:
-

bq. Merges are NOT blocking indexing on trunk no matter which MP you use.
Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in turn 
lags flushes - bigger delays for stop the world flushes / lower bandwith cap 
(after which they are forced to stop the world) for parallel flushes.

So Lance's point is partially valid.

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020230#comment-13020230
 ] 

Simon Willnauer commented on LUCENE-2571:
-

bq. Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in 
turn lags flushes - bigger delays for stop the world flushes / lower bandwith 
cap (after which they are forced to stop the world) for parallel flushes.

True it will make a difference in certain situations but not for this benchmark 
RT does way more merges here since we are flushing way more segments. the time 
windows I used here is where we almost don't merge at all in the trunk run so 
it should not make a difference.

I ran those benchmarks again with BalancedSegmentMergePolicy and it doesn't 
make any difference really. see below
!wikimedium.trunk.Standard.nd10M_dps_BalancedSegmentMergePolicy.png!

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_BalancedSegmentMergePolicy.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019890#comment-13019890
 ] 

Simon Willnauer commented on LUCENE-2571:
-

I run batch indexing benchmarks trunk vs. realtime branch with addDocument and 
with updateDocument. 

For add document I indexed 10M wikipedia docs into a spinning disk reading from 
a separate SSD

Here is the realtime graph:
!wikimedium.realtime.Standard.nd10M_dps_addDocuments.png!

vs. trunk:
!wikimedium.trunk.Standard.nd10M_dps_addDocuments.png!

This graph shows how DWPT is flushing to disk over time:

!wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png!

for updateDocument I build a 10M docs wiki index and indexed the exact same 
documents with updateDocument here are the results:
Realtime Branch:
!wikimedium.realtime.Standard.nd10M_dps.png!

trunk:
!wikimedium.trunk.Standard.nd10M_dps.png!



 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-14 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020147#comment-13020147
 ] 

Lance Norskog commented on LUCENE-2571:
---

Would you consider trying other MergePolicy objects on trunk? The 
BalancedSegment MP tries to avoid these long stoppages.  

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org