[ https://issues.apache.org/jira/browse/SOLR-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905617#comment-15905617 ]
Varun Thacker commented on SOLR-10259: -------------------------------------- Hi Oliver, Patch looks good. However I think this patch was compiled with an older version of Solr? To apply the patch cleanly on master I needed to move the code into {{StatusOp.java}} Also it would be nice to have a test for this. I think all we need to do is create an core , delete a segment file ( something like https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.3.0/solr/core/src/test/org/apache/solr/handler/TestRestoreCore.java#L187 where location is something like {{solrCore.getDataDir()}} ? ) and then call status > admin/cores?action=STATUS returns 500 when a single core has init failures > -------------------------------------------------------------------------- > > Key: SOLR-10259 > URL: https://issues.apache.org/jira/browse/SOLR-10259 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 5.3 > Reporter: Oliver Bates > Priority: Trivial > Attachments: SOLR-10259.patch-1.txt, SOLR-10259.patch-2.txt > > > When I have a healthy core on a node and I call > solr/admin/cores?action=STATUS, I get the following healthy response: > {quote} > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">1607</int> > </lst> > <lst name="initFailures"/> > <lst name="status"> > <lst name="whoisbanana_shard1_replica1"> > <str name="name">whoisbanana_shard1_replica1</str> > <str name="instanceDir"> > /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/ > </str> > <str name="dataDir"> > /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/ > </str> > <str name="config">solrconfig.xml</str> > <str name="schema">schema.xml</str> > <date name="startTime">2017-03-08T15:59:50.18Z</date> > <long name="uptime">380431</long> > <str name="lastPublished">active</str> > <int name="configVersion">0</int> > <lst name="index"> > <int name="numDocs">0</int> > <int name="maxDoc">0</int> > <int name="deletedDocs">0</int> > <long name="indexHeapUsageBytes">0</long> > <long name="version">2</long> > <int name="segmentCount">0</int> > <bool name="current">true</bool> > <bool name="hasDeletions">false</bool> > <str name="directory"> > org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index > lockFactory=org.apache.lucene.store.NativeFSLockFactory@762404a0; > maxCacheMB=48.0 maxMergeSizeMB=4.0) > </str> > <lst name="userData"/> > <long name="sizeInBytes">71</long> > <str name="size">71 bytes</str> > </lst> > </lst> > </lst> > </response> > {quote} > If I then corrupt the index file and reload, e.g. like this: > echo "cheese" >> > /tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index/segments_1 > And then I call the same endpoint (solr/admin/cores?action=STATUS), I get a > 500 back: > {quote} > <response> > <lst name="responseHeader"> > <int name="status">500</int> > <int name="QTime">1508</int> > </lst> > <lst name="error"> > <str name="msg">Error handling 'status' action</str> > <str name="trace"> > org.apache.solr.common.SolrException: Error handling 'status' action at > org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:755) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:231) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:196) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:146) > at > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:676) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443) at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at com.apple.cie.search.auth.TrustFilter.doFilter(TrustFilter.java:44) at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at com.apple.cie.search.id.IdFilter.doFilter(IdFilter.java:38) at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) at > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) Caused by: > org.apache.lucene.index.CorruptIndexException: misplaced codec footer (file > extended?): remaining=23, expected=16 > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/tmp/search_integration_test/solr1/whoisbanana_shard1_replica1/data/index/segments_1"))) > at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:411) at > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:331) at > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:442) at > org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493) at > org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490) at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) > at > org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490) > at > org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344) > at > org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:124) > at > org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:124) > at > org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:585) > at > org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:1202) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:743) > ... 31 more > </str> > <int name="code">500</int> > </lst> > </response> > {quote} > It seems to me like what we really want is to still return a 200, but to list > the init failures under the 'initFailures' key of the response (as seen in > 'healthy response' above). This way, if a node is hosting 10 cores and 1 is > corrupted, I can still query the STATUS endpoint to do get information about > the non-corrupted cores, AND I can more easily determine what the problem > with my corrupted core is because I can see the stack trace. This allows > automated tooling, for instance, to go in there and delete and re-add a > replica until the day arrives that REQUESTRECOVERY and/or > leader-initiated-recovery both work when the index is corrupted (see > https://issues.apache.org/jira/browse/SOLR-9836). > I am not sure which solution the world would like best, so I am proposing two > patches. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org