[ https://issues.apache.org/jira/browse/LUCENE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659448#comment-13659448 ]
Simon Willnauer commented on LUCENE-4989: ----------------------------------------- this might be related to LUCENE-5002 I think. this can happen in multiple scenarios. Can you tell if there are any other blocked threads in flush by any chance? > Hanging on DocumentsWriterStallControl.waitIfStalled forever > ------------------------------------------------------------ > > Key: LUCENE-4989 > URL: https://issues.apache.org/jira/browse/LUCENE-4989 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 4.1 > Environment: Linux 2.6.32 > Reporter: Jessica Cheng > Labels: hang > Fix For: 5.0, 4.3.1 > > > In an environment where our underlying storage was timing out on various > operations, we find all of our indexing threads eventually stuck in the > following state (so far for 4 days): > "Thread-0" daemon prio=5 Thread id=556 WAITING > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74) > at > org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676) > at > org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484) > at ... > I have not yet enabled detail logging and tried to reproduce yet, but looking > at the code, I see that DWFC.abortPendingFlushes does > try { > dwpt.abort(); > doAfterFlush(dwpt); > } catch (Throwable ex) { > // ignore - keep on aborting the flush queue > } > (and the same for the blocked ones). Since the throwable is ignored, I can't > say for sure, but I've seen DWPT.abort thrown in other cases, so if it does > throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This > can be a problem, right? Is it possible to do this instead: > try { > dwpt.abort(); > } catch (Throwable ex) { > // ignore - keep on aborting the flush queue > } finally { > try { > doAfterFlush(dwpt); > } catch (Throwable ex2) { > // ignore - keep on aborting the flush queue > } > } > It's ugly but safer. Otherwise, maybe at least add logging for the throwable > just to make sure this is/isn't happening. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org