[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erick Erickson updated SOLR-14969: ---------------------------------- Summary: Prevent creating multiple cores with the same name which leads to instabilities (race condition) (was: Race condition when creating cores with the same name) > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > ------------------------------------------------------------------------------------------------ > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore > Affects Versions: 8.6, 8.6.3 > Reporter: Andreas Hubold > Priority: Major > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) > at > org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) > at > org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:785) > at > org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:126) > at > org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) > at > org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261) > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) > {noformat} > CoreContainer#createFromDescriptor removes the CoreDescriptor when handling > this exception. The SolrCore created for the first successful call is still > registered in SolrCores.cores, but now there's no corresponding > CoreDescriptor for that name anymore. > This inconsistency leads to subsequent NullPointerExceptions, for example > when using CoreAdmin STATUS with the core name: > CoreAdminOperation#getCoreStatus first gets the non-null SolrCore > (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the > CoreDescriptor is not registered anymore: > {noformat} > 2020-10-27 00:29:25.353 INFO (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall > [admin] webapp=null path=/admin/cores > params={core=blueprint_acgqqafsogyc_comments&action=STATUS&indexInfo=false&wt=javabin&version=2} > status=500 QTime=0 > 2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall > null:org.apache.solr.common.SolrException: Error handling 'STATUS' action > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372) > at > org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) > ... > Caused by: java.lang.NullPointerException > at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333) > at > org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329) > at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > {noformat} > STATUS keeps failing until Solr is restarted. > The NPE for CoreAdmin STATUS is a regression in 8.6. It seems to be caused by > https://github.com/apache/lucene-solr/commit/17ae79b0905b2bf8635c1b260b30807cae2f5463#diff-9652fe8353b7eff59cd6f128bb2699d88361e670b840ee5ca1018b1bc45584d1R324 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org