[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446269#comment-16446269 ] Christine Poerschke commented on SOLR-12232: bq. ... Is this perhaps more properly a Lucene issue? Good question. JIRA can support issue moves between projects -- I think, let me try that here, SOLR-12232 would become a forwarding link to the LUCENE issue. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441909#comment-16441909 ] Jeff Miller commented on SOLR-12232: "Please don't shoot the messenger " Sometimes it's about how you deliver it, not the message itself. Your point is understood and appreciated. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441898#comment-16441898 ] Robert Muir commented on SOLR-12232: {quote} Understood. But we don't have to use NIO. {quote} Yes, use another lock factory or some alternative if you want. But this is NIO lock factory, and well it uses NIO. And its behavior is correct: its wrong to interrupt the NIO stuff. It is definitely OK to dictate that its wrong to interrupt NIO stuff, we document it that way for a reason, because its dangerous. Lock validation and other checks here are important because they prevent screw crazy corruption-looking cases from showing up. Please don't shoot the messenger but fix the actual bugs instead (the perp calling interrupt on lucene threads). > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441890#comment-16441890 ] Jeff Miller commented on SOLR-12232: The thread interrupting is purposeful for our solution and won't be changing anytime soon due to external requirements. It worked just fine for quite a few years until the call to ensureValid was added. Since I saw no specific requirement for this class to close its file channel due to an interrupt exception it seemed a decent solution incase anyone else out there uses interrupts in any manner without removing the ensureValid call for us. If no one sees value in this for Solr then so be it. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441889#comment-16441889 ] David Smiley commented on SOLR-12232: - By trade-off, I mean an app loses raw search speed (and you explained this well) but the app gains the ability to interrupt (cancel) a search task that is taking too long. However wise we may be, I don't think we ought to dictate to all users/apps that doing this is fundamentally wrong (what you call a bug). bq. If you interrupt lucene threads using nio ... Understood. But we don't have to use NIO. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441878#comment-16441878 ] Robert Muir commented on SOLR-12232: There isn't a real tradeoff. I'm not even sure it counts as a "workaround". RAF must synchronize all reads so its basically like just only having one thread, searches will pile up. It has nothing to do with what i like or don't like. If you interrupt lucene threads using nio its gonna look nasty, probably like index corruption. the whole point of lockfactory is to detect bugs in the code: it found one here in solr (or some plugin or something). That's what needs to be fixed. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441875#comment-16441875 ] David Smiley commented on SOLR-12232: - Disclaimer: I haven't studied the approach in the patch. Lucene has {{RAFDirectory}} (in misc) for apps that want to trade-off raw performance for interruptibility. It uses RandomAccessFile and not NIO. Wouldn't it be appropriate to have a LockFactory impl that supports (safe) interruptability too, so they can be used together? I'm not sure I'm getting your point Rob... are you saying, indirectly, that interrupt-*safe* IO is impossible? Or maybe you don't like interruption at all so, in your opinion, anyone using it has made an error in judgement for using it? > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441844#comment-16441844 ] Shawn Heisey commented on SOLR-12232: - [~rcmuir] likely has a much better understanding of the gory details than I do. I've only written a few multi-threaded apps ... but I have *never* used Thread.interrupt. What little reading I've done on the subject tells me that doing so is likely to cause problems. Even if I were to research it and learn how to use interrupting properly, I would never use it on a thread that I didn't create -- especially not those in a comprehensive system like Solr or Lucene. > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441825#comment-16441825 ] Robert Muir commented on SOLR-12232: IMO it does not solve the problem. The correct fix is not to Thread.interrupt lucene threads using NIO apis. it is not safe to use Thread.interrupt with nio-based stuff with lucene: we document that. It is good that locking detected the error in the code (use of Thread.interrupt) because it can have much more dangerous impacts (e.g. loss of a reader). Asynchronous channels are too slow and wont help there. In the future, maybe its fixed in the JDK: http://mail.openjdk.java.net/pipermail/nio-dev/2018-March/004761.html I don't think lucene should mask the problem here because it will not solve anything for these reasons. Please, fix the Thread.interrupt > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after
[ https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441816#comment-16441816 ] Erick Erickson commented on SOLR-12232: --- [~rcmuir ] [~mikemccand] Is this perhaps more properly a Lucene issue? > NativeFSLockFactory loses the channel when a thread is interrupted and the > SolrCore becomes unusable after > -- > > Key: SOLR-12232 > URL: https://issues.apache.org/jira/browse/SOLR-12232 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.1.1 >Reporter: Jeff Miller >Assignee: Erick Erickson >Priority: Minor > Labels: NativeFSLockFactory, locking > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > The condition is rare for us and seems basically a race. If a thread that is > running just happens to have the FileChannel open for NativeFSLockFactory and > is interrupted, the channel is closed since it extends > [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html] > Unfortunately this means the Solr Core has to be unloaded and reopened to > make the core usable again as the ensureValid check forever throws an > exception after. > org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an > external force: > NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 > exclusive invalid],creationTime=2018-04-06T21:45:11Z) at > org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > > Proposed solution is using AsynchronousFileChannel instead, since this is > only operating on a lock and .size method -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org