[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Jason Rutherglen (JIRA) Thu, 06 Jan 2011 13:01:26 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978504#action_12978504
 ]


Jason Rutherglen commented on LUCENE-2324:
------------------------------------------

{quote}actually what should be happening currently if the (default)
ThreadAffinityThreadPool is used. I've to check the code again and maybe write
a test specifically for that.{quote}

Lets try to test it, though I'm not immediately sure how the test case'd look. 

bq. let's add seqIDs back after the DWPT changes are done and in trunk.

Right.

{quote}True, the only global lock that locks all thread states happens when
flushAllThreads is called. This is called when IW explicitly triggers a flush,
e.g. on close/commit. However, maybe this is not the right approach?{quote}

I think this is fine for the DWPT branch as flush, commit, and close are
explicitly blocked commands issued by the user. If we implemented something
more complex now, it wouldn't carry over to RT because the DWPTs don't require
flushing to search on them. Which leads to the main drawback probably being for
NRT, eg, get reader. Hmm... In that case a stop the world flush does affect
overall indexing performance. Perhaps we can add flush and not block all DWPTs
in a separate issue after the DWPT branch is merged to trunk, if there's user
need?  Or perhaps it's easy to implement, I'm still trying to get a feel for the
lock progression in the branch. 

In the indexing many documents case, the DWPTs'll be flushed by the tiered RAM
system. It's the bulk add case where we don't want to block all threads/DWPTs
at once, eg, I think our main goal is to fix Mike's performance test, with NRT
being secondary or even a distraction.

{quote}But for b) and c) it's unclear what should happen if a DWPT flush fails
after some completed already successfully before.{quote}

Right, all that'd be solved if we bulk moved IW to a Scala-like asynchronous
queuing model. However it's probably a bit too much to do right now. Perhaps in
the bulk add-many-docs case we'll need a callback for errors? No because the
add doc method call that triggers the flush will report any exception(s).

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, 
> lucene-2324.patch, LUCENE-2324.patch, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Reply via email to