[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859906#action_12859906
]
Michael McCandless commented on LUCENE-2324:
--------------------------------------------
{quote}
I'm not sure I understand how this would help for ParallelReader?
I think you can't use multi-threaded indexing even today, because you
have no control over the order in which the docs will make it into the
index.
{quote}
Well, to set up indexes for PR today, you have to run IndexWriter in a very
degraded state -- flush by doc count, use a single thread, turn off concurrent
merging (use SMS), use LogDocMergePolicy.
{quote}
So having maxBufferedDocs per DWPT seems tempting to me. Then you know
that each written segment will have exactly a size of maxBufferedDocs,
so this is much more predictable. And if you index with a single
thread only the behavior is identical to a "global" maxBufferedDocs
flush trigger.
{quote}
Yeah, maybe that'd be sufficient...? It'd sort of "match" the current
behaviour, in that you get segments flushed to the index with that many docs.
> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]