[ 
https://issues.apache.org/jira/browse/SOLR-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061939#comment-17061939
 ] 

Richard Goodman commented on SOLR-14325:
----------------------------------------

Hi David, 

we noticed the following when trying to recover some instances on a bigger 
cluster set up, because of that we rolled back:

{code}
2020-03-17 13:27:38.288 INFO  (qtp511717113-1079) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores params={wt=json} status=500 QTime=4821977
2020-03-17 13:27:38.289 ERROR (qtp511717113-1079) [   ] o.a.s.s.HttpSolrCall 
null:org.apache.solr.common.SolrException: Error handling 'STATUS' action
        at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:363)
        at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396)
        at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
        at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
        at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
        at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at org.eclipse.jetty.server.Server.handle(Server.java:502)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
        at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.NoSuchFileException: 
/data/solr/solrcloud-cluster0/data/a_collection_shard9_replica_n30/data/index.20200317120438926/_1eo70_6t1.liv
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
        at java.nio.channels.FileChannel.open(FileChannel.java:287)
        at java.nio.channels.FileChannel.open(FileChannel.java:335)
        at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
        at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:181)
{code}

I had a look at your suggestions, and got a bit confused. The 
{{SolrCore.getIndexReaderFactory}} doesnt return a {{DirectoryReader}} unless 
you create a new one? Does that seem okay? 

I ended up making the following changes to the patch above;
{code}
diff --git 
a/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java 
b/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java
index 71802debbff..00495cdf2a7 100644
--- a/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java
+++ b/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java
@@ -36,6 +36,7 @@ import org.apache.solr.common.util.SimpleOrderedMap;
 import org.apache.solr.common.util.Utils;
 import org.apache.solr.core.CoreContainer;
 import org.apache.solr.core.CoreDescriptor;
+import org.apache.solr.core.IndexReaderFactory;
 import org.apache.solr.core.SolrCore;
 import org.apache.solr.core.SolrInfoBean;
 import org.apache.solr.core.snapshots.SolrSnapshotManager;
@@ -337,9 +338,10 @@ enum CoreAdminOperation implements CoreAdminOp {
               info.add("cloud", cloudInfo);
             }
             if (isIndexInfoNeeded) {
-              try (DirectoryReader dirReader = 
DirectoryReader.open(core.getDirectory())) {
+              IndexReaderFactory indexReaderFactory = 
core.getIndexReaderFactory();
+              try (DirectoryReader dirReader = 
indexReaderFactory.newReader(core.getDirectory(), core)) {
                 SimpleOrderedMap<Object> indexInfo = 
LukeRequestHandler.getIndexInfo(dirReader);
-                long size = core.getIndexSize();
+                long size = 
core.getDirectoryFactory().size(core.getDirectory());
                 indexInfo.add("sizeInBytes", size);
                 indexInfo.add("size", NumberUtils.readableSize(size));
                 info.add("index", indexInfo);
{code}

I then tested this on my small dev cluster, where I turned one node off _(to 
preserve a state of one replica)_ and then did a force expungeDelete on a core 
on the other node to make the core change state. 

I then turned on the other node, and the replica recovered fine, and 0 errors. 
Not sure if this has fixed the problem. It would be great if you could confirm 
from looking at the code, however, I'm hoping to have a bigger cluster to scale 
with our live clusters to free-play on tomorrow to continue 

> Core status could be improved to not require an IndexSearcher
> -------------------------------------------------------------
>
>                 Key: SOLR-14325
>                 URL: https://issues.apache.org/jira/browse/SOLR-14325
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: SOLR-14325.patch
>
>
> When the core status is told to request "indexInfo", it currently grabs the 
> SolrIndexSearcher but only to grab the Directory.  SolrCore.getIndexSize also 
> only requires the Directory.  By insisting on a SolrIndexSearcher, we 
> potentially block for awhile if the core is in recovery since there is no 
> SolrIndexSearcher.
> [https://lists.apache.org/thread.html/r076218c964e9bd6ed0a53133be9170c3cf36cc874c1b4652120db417%40%3Cdev.lucene.apache.org%3E]
> It'd be nice to have a solution that conditionally used the Directory of the 
> SolrIndexSearcher only if it's present so that we don't waste time creating 
> one either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to