[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238072#comment-17238072
 ] 

ASF subversion and git services commented on LUCENE-9508:
-

Commit c71f119e9ac0a179b0f2d1741306bb0046e12dac in lucene-solr's branch 
refs/heads/master from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c71f119 ]

LUCENE-9508: Fix DocumentsWriter to block threads until unstalled (#2085)

DWStallControl expects the caller to loop on top of the wait call to make
progress with flushing if the DW is stalled. This logic wasn't applied such that
DW only stalled for one second and then released the indexing thread. This can 
cause
OOM if for instance during a full flush one DWPT gets stuck and onther threads 
keep on
indexing.

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-13 Thread Simon Willnauer (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231249#comment-17231249
 ] 

Simon Willnauer commented on LUCENE-9508:
-

uups sorry! I meant [~shamirwasia] thanks for the headsup

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-12 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231109#comment-17231109
 ] 

Zach Chen commented on LUCENE-9508:
---

Hi [~simonw], just to clarify I wasn't the one who originally reported the 
issue. I just poked around in code and run some tests to see if I can help here.

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-11-12 Thread Simon Willnauer (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230497#comment-17230497
 ] 

Simon Willnauer commented on LUCENE-9508:
-

Hey Zach, thanks for opening this. Lemme ask some question and clarify what is 
going on here first:
{quote} 2) Should the fullFlush thread wait indefinitely for the lock on 
ThreadStates ? Since single blocking writing thread can block the full flush 
here.
{quote}

yes we have to block on the threadstates here since this is the contract of 
full flush in order to atomically commit changes and establish a happens before 
relationship.

{quote}
1) Should *preUpdate* look into the blocked flushes information as well instead 
of just flush queue ?
{quote}

I am not sure what is would do with the information in blocked flushes? Can you 
elaborate on this? we can't let blocked flushes go unless the full flush is 
over otherwise we will have inconsistent commits. 

Can you share your IndexWriter config and how you configured the 10% heap?
Can you also share what thread holds the ThreadState that the full flush is 
waiting for? I wonder what causes this situation. 



> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-10-19 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217280#comment-17217280
 ] 

Zach Chen commented on LUCENE-9508:
---

Looks like the logic in *markForFullFlush* has received some changes from 
https://issues.apache.org/jira/browse/LUCENE-9304, so not sure if back-porting 
/ upgrading would resolve this particular issue?

On the other hand, I did try playing with the latest code in master a bit and 
come up with the following test similar to what you described, and can still 
block the main thread executing *markForFullFlush.* However, I'm not sure if 
this test case is relevant / valid in actual production environment ? 
{code:java}
public void testBlockedFullFlush() throws IOException {
  try (Directory directory = newDirectory()) {
try(IndexWriter writer = new IndexWriter(directory, new 
IndexWriterConfig())) {
  writer.addDocument(new Document());

  DocumentsWriterPerThreadPool pool = writer.docWriter.perThreadPool;
  assertEquals(1, pool.size());

  CountDownLatch latch = new CountDownLatch(1);
  Thread longLockingThread = new Thread(() -> {
DocumentsWriterPerThread first = pool.getAndLock();

DocumentsWriterPerThread second = pool.getAndLock();
pool.marksAsFreeAndUnlock(second);

assertEquals(2, pool.size());

try {
  latch.await();
  pool.marksAsFreeAndUnlock(first);
} catch (InterruptedException e) {
  e.printStackTrace();
}
  });

  longLockingThread.start();
  writer.docWriter.flushControl.markForFullFlush();

  // Wont be able to reach this step as the line above blocked
  latch.countDown();
}
  }
}

{code}
 

> DocumentsWriter doesn't check for BlockedFlushes in stall mode``
> 
>
> Key: LUCENE-9508
> URL: https://issues.apache.org/jira/browse/LUCENE-9508
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.5.1
>Reporter: Sorabh Hamirwasia
>Priority: Major
>  Labels: IndexWriter
>
> Hi,
> I was investigating an issue where the memory usage by a single Lucene 
> IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
> memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
> this case ~3GB). So ideally memory usage should not go above that limit. I 
> looked into the heap dump and found that the fullFlush thread when enters 
> *markForFullFlush* method, it tries to take lock on the ThreadStates of all 
> the DWPT thread sequentially. If lock on one of the ThreadState is blocked 
> then it will block indefinitely. This is what happened in my case, where one 
> of the DWPT thread was stuck in indexing process. Due to this fullFlush 
> thread was unable to populate the flush queue even though the stall mode was 
> detected. This caused the new indexing request which came on indexing thread 
> to continue after sleeping for a second, and continue with indexing. In 
> **preUpdate()** method it looks for the stalled case and see if there is any 
> pending flushes (based on flush queue), if not then sleep and continue. 
> Question: 
> 1) Should **preUpdate** look into the blocked flushes information as well 
> instead of just flush queue ?
> 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates 
> ? Since single blocking writing thread can block the full flush here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org