[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592994#comment-16592994 ] Shawn Heisey commented on SOLR-12691: - I did an experiment. Downloaded 7.4.0. Created core1 through core25, all transient and not loading on startup. Set transientCacheSize to 5. Accessed some cores in this order with the admin UI dropdown: core1 core10 core11 core12 core13 core14 When core14 was accessed, Solr unloaded core1, exactly as expected. With the admin UI, I did some queries on core10, and submitted an update on core11. Then I accessed core17 from the admin UI dropdown. If the core unloading were LRU in the way that I think it should work, it would have been core12 that got unloaded when core17 was accessed. It was core10 that was unloaded, because it was the oldest entry in the LinkedHashMap. The fact that I had made queries on core10 did not make any difference, and I think it should have. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: SOLR-12691-test.patch > > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - >
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592776#comment-16592776 ] Erick Erickson commented on SOLR-12691: --- [~elyograg] It _is_ an LRU cache. The LinkedHashMap is created as: LinkedHashMap(Math.min(cacheSize, 1000), 0.75f, true) >From the Javadocs: {quote}public LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) Constructs an empty {{LinkedHashMap}} instance with the specified initial capacity, load factor and ordering mode. Parameters: {{initialCapacity}} - the initial capacity {{loadFactor}} - the load factor {{accessOrder}} - the ordering mode - {{true}} for access-order, {{false}} for insertion-order {quote} Although the tests don't show that explicitly. Here's a slight change to TestLazyCores showing that the transient cache is LRU that should be incorporated in any fixes here. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > Attachments: SOLR-12691-test.patch > > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at >
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592767#comment-16592767 ] Shawn Heisey commented on SOLR-12691: - An additional thought: Even if the problem can be found and fixed so the two-core reproduction scenario works perfectly, I can tell you that the performance will be *awful* as Solr continually unloads and loads cores. The same thing might happen in the real-world scenario. That would likely be preferable to index corruption, though. [~erickerickson], should we have another issue to improve choosing which transient core to unload? Do it on an LRU basis, instead of load order? I was looking into request handler code. A number of request handlers implement SolrCoreAware, but it's done on an individual handler basis, not on RequestHandlerBase. If the base class were to implement SolrCoreAware and handle updating the timestamp, I think the handler code would be overall cleaner, and we might be in a better position to make LRU unloading happen. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592199#comment-16592199 ] Erick Erickson commented on SOLR-12691: --- Shawn is largely correct here, this is the first time I've heard of this issue. If you want to dig into the code and see if you can find a way to fix it, I'd be happy to review any patches you'd care to attach to this JIRA. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1028) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1071) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:286) > [solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz > - 2018-06-18 16:55:13] >
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591917#comment-16591917 ] Shawn Heisey commented on SOLR-12691: - The person on the mailing list who found the problem asked this on the mailing list: {quote} I see it's marked as minor. Can we bump up the priority please ? {quote} Here is my response: I don't consider it a high priority item. It's going to take a lot of effort to actually find the cause and fix it. So far you're the only person that I've heard of that's tried it and had a failure. There may of course be others who never said anything, but it is certainly not a common use-case. [~erickerickson] is the person who wrote the code here, and this was not a use-case that was ever considered. So as far as I can tell, in the eyes of the person who created the transient core feature, you're using it incorrectly. This is another reason for the priority choice. If [~erickerickson] disagrees with my view, he is free to change the priority and I will not complain. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589591#comment-16589591 ] Erick Erickson commented on SOLR-12691: --- {quote}Queries very likely update a timestamp within the core that marks it as most-recently-used, but do updates do the same? {quote} Yes. The key is that the transient cache is a :LinkedHashMap with the removeEldestEntry overridden. Any access to that structure moves the accessed core to the end of the list so it's now the last one to be aged out. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1028) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1071) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at >
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589589#comment-16589589 ] Shawn Heisey commented on SOLR-12691: - Making an attempt to check the code, I actually can't see anything that tracks or uses a most-recently-used timestamp. So I'm wondering how LotsOfCores actually decides which core(s) to close, and whether that might need some work. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3571) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1028) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1071) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:286) > [solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz > - 2018-06-18
[jira] [Commented] (SOLR-12691) Index corruption when sending updates to multiple cores, if those cores can get unloaded by LotsOfCores
[ https://issues.apache.org/jira/browse/SOLR-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589582#comment-16589582 ] Shawn Heisey commented on SOLR-12691: - Thoughts: This is a scenario that was not considered when LotsOfCores functionality was built. If system resources are too scarce to allow all cores getting updates to be loaded at the same time, this problem is likely. Queries very likely update a timestamp within the core that marks it as most-recently-used, but do updates do the same? If not, fixing that might make the issue a lot less likely in the production setup of the user that reported this problem on the mailing list, where they have 75 cores (which will be increasing) and a transientCacheSize of 32. They have indicated that the majority of their cores are unlikely to see simultaneous updates. Making sure that the core's most-recently-used information is refreshed by update requests might be an easy win for real-world usage. Finding the cause of this problem might prove to be very difficult. Depending on what's found, fixing it might be even harder. It might never be possible to fix the two-core setup that reproduces this problem. > Index corruption when sending updates to multiple cores, if those cores can > get unloaded by LotsOfCores > --- > > Key: SOLR-12691 > URL: https://issues.apache.org/jira/browse/SOLR-12691 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4 >Reporter: Shawn Heisey >Priority: Minor > > When the LotsOfCores setting 'transientCacheSize' results in the unloading of > cores that are getting updates, the indexes in one or more of those cores can > get corrupted. > How to reproduce: > * Set the "transientCacheSize" to 1 > * Create two cores that are both set to transient. > * Ingest high load to core1 only (no issue at this time) > * Continue ingest high load to core1 and start ingest load to core2 > simultaneously (core2 immediately corrupted) > Error with stacktrace: > {noformat} > 2018-08-16 23:02:31.212 ERROR (qtp225472281-4098) [ > x:aggregator-core-be43376de27b1675562841f64c498] o.a.s.u.SolrIndexWriter > Error closing IndexWriter > java.nio.file.NoSuchFileException: > /opt/solr/volumes/data1/4cf838d4b9e4675-core-897/index/_2_Lucene50_0.pos > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > ~[?:1.8.0_162] > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > ~[?:1.8.0_162] > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > ~[?:1.8.0_162] > at java.nio.file.Files.readAttributes(Files.java:1737) ~[?:1.8.0_162] > at java.nio.file.Files.size(Files.java:2332) ~[?:1.8.0_162] > at > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:217) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:558) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.getSegmentSizes(TieredMergePolicy.java:279) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:300) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2199) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:51:45] > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2162) > ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - >