[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Jason Rutherglen (JIRA) Sat, 22 Jan 2011 08:13:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985148#action_12985148
 ]


Jason Rutherglen commented on LUCENE-2324:
------------------------------------------

bq. the cost of replaying the log, assuming the log is "smallish"

This is recording and replaying the doc-ids?  How/when does a previous BV 
become 'free' to be used by the next reader?  What if they're open at the same 
time?  And if it's a previous previous reader that's been closed, won't that be 
quite a few docids to save?  Eg, a delete-by-query has removed thousands of 
docs, I guess we'd use System.arraycopy then.  The most usual case is 
updateDocument with [N]RT, which'd generate few doc-ids.  

bq. System.arraycopy, while fast, is still O(N)

Right, the larger segments will really adversely affect performance, as they do 
today, however the indexing is so much slower with NRT + clone that it's not 
noticeable.  

bq. Using RT/NRT shouldn't slow down searching

Right!  The cost needs to be in the indexing and/or reopen threads.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
> lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Reply via email to