[ https://issues.apache.org/jira/browse/SOLR-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Throgmorton resolved SOLR-15228. ------------------------------------- Resolution: Duplicate I guess Jira made another bug when I hit refresh? > Single host in a bad state can block collection creation for the cluster with > autoscaling enabled > ------------------------------------------------------------------------------------------------- > > Key: SOLR-15228 > URL: https://issues.apache.org/jira/browse/SOLR-15228 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling > Affects Versions: 8.2 > Reporter: Andy Throgmorton > Priority: Minor > > We configured a SolrCloud cluster (running 8.2) with this cluster autoscaling > policy: > {noformat} > { > "set-cluster-preferences":[ > { > "minimize":"cores", > "precision":5 > }, > { > "maximize":"freedisk", > "precision":25 > }, > { > "minimize":"sysLoadAvg", > "precision":10 > }], > "set-cluster-policy":[ > { > "replica": "<2", > "node": "#ANY" > }], > "set-trigger": { > "name":".auto_add_replicas", > "event":"nodeLost", > "waitFor":"10m", > "enabled":true, > "actions":[ > { > "name":"auto_add_replicas_plan", > "class":"solr.AutoAddReplicasPlanAction"}, > { > "name":"execute_plan", > "class":"solr.ExecutePlanAction"}] > } > }{noformat} > A node was rebooted at one point, and when that node came back, it had > trouble establishing a connection with ZK when it was initializing the > CoreContainer. As a result, it returns 404s for (I think?) all admin requests. > Now, any call to create a collection in that cluster throw an error, with > this stacktrace: > {noformat} > 2021-03-04 12:47:03.615 ERROR > (OverseerThreadFactory-141-thread-4-processing-n:HOST_REDACTED:8983_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: COLLECTON_REDACTED > operation: create failed:org.apache.solr.common.SolrException: Error getting > replica locations : unable to get autoscaling policy session > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:195) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.solr.common.SolrException: unable to get autoscaling > policy session > at > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:129) > at > org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382) > at > org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:410) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:190) > ... 6 more > Caused by: org.apache.solr.common.SolrException: > org.apache.solr.common.SolrException: Error getting remote info > at > org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78) > at > org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139) > at > org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128) > at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358) > at > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492) > at > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457) > at > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513) > at > org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:127) > ... 10 more > Caused by: org.apache.solr.common.SolrException: Error getting remote info > at > org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:364) > at > org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76) > ... 20 more > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at https://HOSTNAME_REDACTED:8983/solr: Expected mime type > application/octet-stream but got text/html. <html> > <head> > <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> > <title>Error 404 Not Found</title> > </head> > <body><h2>HTTP ERROR 404</h2> > <p>Problem accessing /solr/admin/metrics. Reason: > <pre> Not Found</pre></p><h3>Caused > by:</h3><pre>javax.servlet.ServletException: > javax.servlet.UnavailableException: Error processing the request. > CoreContainer is either not initialized or shutting down. > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:168) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.eclipse.jetty.server.Server.handle(Server.java:505) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321) > at > org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: javax.servlet.UnavailableException: Error processing the request. > CoreContainer is either not initialized or shutting down. > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152) > ... 21 more > </pre> > <h3>Caused by:</h3><pre>javax.servlet.UnavailableException: Error processing > the request. CoreContainer is either not initialized or shutting down. > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.eclipse.jetty.server.Server.handle(Server.java:505) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321) > at > org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at org.eclipse.jetty....{noformat} > I looked through the Solr code and to me it looks like: > * Client asks to create collection (CreateCollectionCmd) > * PolicyHelper.getReplicaLocations tries to build a map of where every > replica is > * To do that, it creates a SessionRef, which needs to populate its cache > first > * SessionRef attempts to collect all metrics, [including metrics from every > node|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java#L583] > (or {{Row}}) > * SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo makes the > remote call > ** It will retry on certain errors (see below), but not for this error > ({{HttpSolrClient$RemoteSolrException}}), which bubbles up and fails the > request > ** > [https://github.com/apache/lucene-solr/blob/branch_8_8/solr/solrj/src/java/org/apache/solr/client/solrj/impl/SolrClientNodeStateProvider.java#L310-L338] > > I realize this autoscaling code is gone in 9.x, but at least wanted to report > this issue for documentation purposes, in case others see this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org