[ https://issues.apache.org/jira/browse/SOLR-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061939#comment-17061939 ]
Richard Goodman commented on SOLR-14325: ---------------------------------------- Hi David, we noticed the following when trying to recover some instances on a bigger cluster set up, because of that we rolled back: {code} 2020-03-17 13:27:38.288 INFO (qtp511717113-1079) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={wt=json} status=500 QTime=4821977 2020-03-17 13:27:38.289 ERROR (qtp511717113-1079) [ ] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error handling 'STATUS' action at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:363) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.NoSuchFileException: /data/solr/solrcloud-cluster0/data/a_collection_shard9_replica_n30/data/index.20200317120438926/_1eo70_6t1.liv at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:181) {code} I had a look at your suggestions, and got a bit confused. The {{SolrCore.getIndexReaderFactory}} doesnt return a {{DirectoryReader}} unless you create a new one? Does that seem okay? I ended up making the following changes to the patch above; {code} diff --git a/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java b/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java index 71802debbff..00495cdf2a7 100644 --- a/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java +++ b/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java @@ -36,6 +36,7 @@ import org.apache.solr.common.util.SimpleOrderedMap; import org.apache.solr.common.util.Utils; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.CoreDescriptor; +import org.apache.solr.core.IndexReaderFactory; import org.apache.solr.core.SolrCore; import org.apache.solr.core.SolrInfoBean; import org.apache.solr.core.snapshots.SolrSnapshotManager; @@ -337,9 +338,10 @@ enum CoreAdminOperation implements CoreAdminOp { info.add("cloud", cloudInfo); } if (isIndexInfoNeeded) { - try (DirectoryReader dirReader = DirectoryReader.open(core.getDirectory())) { + IndexReaderFactory indexReaderFactory = core.getIndexReaderFactory(); + try (DirectoryReader dirReader = indexReaderFactory.newReader(core.getDirectory(), core)) { SimpleOrderedMap<Object> indexInfo = LukeRequestHandler.getIndexInfo(dirReader); - long size = core.getIndexSize(); + long size = core.getDirectoryFactory().size(core.getDirectory()); indexInfo.add("sizeInBytes", size); indexInfo.add("size", NumberUtils.readableSize(size)); info.add("index", indexInfo); {code} I then tested this on my small dev cluster, where I turned one node off _(to preserve a state of one replica)_ and then did a force expungeDelete on a core on the other node to make the core change state. I then turned on the other node, and the replica recovered fine, and 0 errors. Not sure if this has fixed the problem. It would be great if you could confirm from looking at the code, however, I'm hoping to have a bigger cluster to scale with our live clusters to free-play on tomorrow to continue > Core status could be improved to not require an IndexSearcher > ------------------------------------------------------------- > > Key: SOLR-14325 > URL: https://issues.apache.org/jira/browse/SOLR-14325 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: David Smiley > Priority: Major > Attachments: SOLR-14325.patch > > > When the core status is told to request "indexInfo", it currently grabs the > SolrIndexSearcher but only to grab the Directory. SolrCore.getIndexSize also > only requires the Directory. By insisting on a SolrIndexSearcher, we > potentially block for awhile if the core is in recovery since there is no > SolrIndexSearcher. > [https://lists.apache.org/thread.html/r076218c964e9bd6ed0a53133be9170c3cf36cc874c1b4652120db417%40%3Cdev.lucene.apache.org%3E] > It'd be nice to have a solution that conditionally used the Directory of the > SolrIndexSearcher only if it's present so that we don't waste time creating > one either. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org