[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983246#action_12983246 ]
Michael Busch commented on LUCENE-2324: --------------------------------------- {quote} I ran a quick perf test here: I built the 10M Wikipedia index, Standard codec, using 6 threads. Trunk took 541.6 sec; RT took 518.2 sec (only a bit faster), but the test wasn't really fair because it flushed @ docCount=12870. {quote} Thanks for running the tests! Hmm that's a bit disappointing - we were hoping for more speedup. Flushing by docCount is currently per DWPT, so every initial segment in your test had 12870 docs. I guess there's a lot of merging happening. Maybe you could rerun with higher docCount? bq. But I can't test flush by RAM - that's not working yet on RT right? True. I'm going to add that soonish. There's one thread-safety bug related to deletes that needs to be fixed too. {quote} Then I ran a single-threaded test. Trunk took 1097.1 sec and RT took 1040.5 sec - a bit faster! Presumably in the noise (we don't expect a speedup?), but excellent that it's not slower... {quote} Yeah I didn't expect much speedup - cool! :) Maybe because some code is gone, like the WaitQueue, not sure how much overhead that added in the single-threaded case. {quote} I think we lost infoStream output on the details of flushing? I can't see when which DWPTs are flushing... {quote} Oh yeah, good point, I'll add some infoStream messages to DWPT! > Per thread DocumentsWriters that write their own private segments > ----------------------------------------------------------------- > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, > LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, > LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, > lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org