[
https://issues.apache.org/jira/browse/LUCENE-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268666#comment-16268666
]
Michael McCandless commented on LUCENE-8068:
--------------------------------------------
This is a really nice idea! It gives more granular control over moving IW's
RAM buffer to disk, and lets other threads (than the indexing threads)
participate in flushing.
Can you mark the new method as {{@lucene.experimental}}?
There's no guarantee it flushes that largest (most heap consuming) DWPT right?
I think that's fine. It looks like it first tries to find any DWPT already
marked for flush, and failing that, it then finds the largest one.
What if the largest one is currently still indexing a document (via another
thread)? Do we wait (on the {{lock}} call) for that one document to finish?
Or are we only iterating over DWPTs not currently indexing a document? But
then is there a starvation risk?
Is there a (small) concurrency risk that a flush (via another thread) is called
right after you first asked for next pending DWPT, got null, then tried to find
the largest non-pending DWPT, but the concurrent flush has now marked them all
pending? I think it's fine if so; maybe explain in the javadocs that this is
just "best effort"?
Should the new method maybe return a boolean indicating whether it actually
wrote a segment?
Maybe instead of using "documents writer per thread" and "writer per thread
buffer" in the javadocs, just refer to them as the per-thread in memory
segments?
Small typo here ("non" -> "none"):
{noformat}
+ * Returns the largest non-pending flushable DWPT or <code>null</code> if
there is non.
{noformat}
Maybe assert that {{freeList.remove}} returned true here?
{noformat}
+ ThreadState getAndLock(ThreadState state) {
+ synchronized (this) {
+ freeList.remove(state);
+ }
+ state.lock();
+ return state;
+ }
+
{noformat}
> Allow IndexWriter to write a single DWPT to disk
> ------------------------------------------------
>
> Key: LUCENE-8068
> URL: https://issues.apache.org/jira/browse/LUCENE-8068
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Fix For: master (8.0), 7.2
>
> Attachments: LUCENE-8068.patch, LUCENE-8068.patch
>
>
> Today we IW can only flush a DWPT to disk if an external resource calls
> flush() or refreshes a NRT reader or if a DWPT is selected as flush pending.
> Yet, the latter has the problem that it always ties up an indexing thread and
> if flush / NRT refresh is called a whole bunch of indexing threads is tied
> up. If IW could offer a simple `flushNextBuffer()` method that synchronously
> flushes the next pending or biggest active buffer to disk memory could be
> controlled in a more fine granular fashion from outside of the IW. This is
> for instance useful if more than one IW (shards) must be maintained in a
> single JVM / system.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]