[ 
https://issues.apache.org/jira/browse/LUCENE-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268666#comment-16268666
 ] 

Michael McCandless commented on LUCENE-8068:
--------------------------------------------

This is a really nice idea!  It gives more granular control over moving IW's 
RAM buffer to disk, and lets other threads (than the indexing threads) 
participate in flushing.

Can you mark the new method as {{@lucene.experimental}}?

There's no guarantee it flushes that largest (most heap consuming) DWPT right?  
I think that's fine.  It looks like it first tries to find any DWPT already 
marked for flush, and failing that, it then finds the largest one.

What if the largest one is currently still indexing a document (via another 
thread)?  Do we wait (on the {{lock}} call) for that one document to finish?  
Or are we only iterating over DWPTs not currently indexing a document?  But 
then is there a starvation risk?

Is there a (small) concurrency risk that a flush (via another thread) is called 
right after you first asked for next pending DWPT, got null, then tried to find 
the largest non-pending DWPT, but the concurrent flush has now marked them all 
pending?  I think it's fine if so; maybe explain in the javadocs that this is 
just "best effort"?

Should the new method maybe return a boolean indicating whether it actually 
wrote a segment?

Maybe instead of using "documents writer per thread" and "writer per thread 
buffer" in the javadocs, just refer to them as the per-thread in memory 
segments?

Small typo here ("non" -> "none"):

{noformat}
+   * Returns the largest non-pending flushable DWPT or <code>null</code> if 
there is non.
{noformat}

Maybe assert that {{freeList.remove}} returned true here?

{noformat}
+  ThreadState getAndLock(ThreadState state) {
+    synchronized (this) {
+      freeList.remove(state);
+    }
+    state.lock();
+    return state;
+  }
+
{noformat}


> Allow IndexWriter to write a single DWPT to disk
> ------------------------------------------------
>
>                 Key: LUCENE-8068
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8068
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Simon Willnauer
>             Fix For: master (8.0), 7.2
>
>         Attachments: LUCENE-8068.patch, LUCENE-8068.patch
>
>
> Today we IW can only flush a DWPT to disk if an external resource calls 
> flush()  or refreshes a NRT reader or if a DWPT is selected as flush pending. 
> Yet, the latter has the problem that it always ties up an indexing thread and 
> if flush / NRT refresh is called a whole bunch of indexing threads is tied 
> up. If IW could offer a simple `flushNextBuffer()` method that synchronously 
> flushes the next pending or biggest active buffer to disk memory could be 
> controlled in a more fine granular fashion from outside of the IW. This is 
> for instance useful if more than one IW (shards) must be maintained in a 
> single JVM / system. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to