[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983346#action_12983346
]
Michael Busch commented on LUCENE-2324:
---------------------------------------
bq. Why does DW.anyDeletions need to be sync'd?
Hmm good point. Actually only the call to DW.pendingDeletes.any() needs to be
synced, but not the loop that calls the DWPTs.
{quote}
In ThreadAffinityDWTP... it may be better if we had a single queue,
where threads wait in line, if no DWPT is available? And when a DWPT
finishes it then notifies any waiting threads? (Ie, instead of queue-per-DWPT).
{quote}
Whole foods instead of safeway? :)
Yeah that would be fairer. A large doc (= a full cart) wouldn't block unlucky
other docs. I'll make that change, good idea!
{quote}
I see the fieldInfos.update(dwpt.getFieldInfos()) (in
DW.updateDocument) - is there a risk that two threads bring a new
field into existence at the same time, but w/ different config? Eg
one doc omitsTFAP and the other doesn't? Or, on flush, does each DWPT
use its private FieldInfos to correctly flush the segment? (Hmm: do
we seed each DWPT w/ the original FieldInfos created by IW on init?).
{quote}
Every DWPT has its own private FieldInfos. When a segment is flushed the DWPT
uses its private FI and then it updates the original DW.fieldInfos (from IW),
which is a synchronized call.
The only consumer of DW.getFieldInfos() is SegmentMerger in IW. Hmm, given
that IW.flush() isn't synchronized anymore I assume this can lead into a
problem? E.g. the SegmentMerger gets a FieldInfos that's "newer" than the list
of segments it's trying to flush?
bq. How are we handling the case of open IW, do delete-by-term but no added
docs?
DW has a SegmentDeletes (pendingDeletes) which gets pushed to the last segment.
We only add delTerms to DW.pendingDeletes if we couldn't push it to any DWPT.
Btw. I think the whole pushDeletes business isn't working correctly yet, I'm
looking into it. I need to understand the code that coalesces the deletes
better.
bq. In DW.deleteTerms... shouldn't we skip a DWPT if it has no buffered docs?
Yeah, I did that already, but not committed yet.
> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
> LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
> LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch,
> lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]