[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353442#comment-16353442 ] Gopalakrishnan B commented on SOLR-3274: Hi Team, Is this issue resolved on the latest Solr 7.x (7.2.1)? Thanks. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen >Priority: Major > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time > {code} > Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166909#comment-16166909 ] Erick Erickson commented on SOLR-3274: -- No, it's still marked as "unresolved". That have you read through the discussion tried the suggestions of increasing the ZK timeouts? > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time > {code} > Mar 22, 2012 5:07:26 AM org.apache.so
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165782#comment-16165782 ] Navneet Khanna commented on SOLR-3274: -- Hi Guys, Is this issue fixed? I am getting this issue with solr 6.6.0 and zk 3.4.10. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time > {code} > Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log > SEVERE: org.apache
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764336#comment-15764336 ] Yago Riveiro commented on SOLR-3274: I hitting this in 6.3.0 a lot and I don't know why, my TTL for zookeeper is 120s and I had no log into the gc log with pauses higher than 100ms Exists some configuration to see the reason for the failure talking with ZooKeeper? like connection timeout or something else? org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:275) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:240) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:158) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1511#comment-1511 ] Dragos C commented on SOLR-3274: It happened to me on Solr 5.5.0 with the default setup. I just unzipped the file, started solr in cloud mode (sharts: 1, replication factor: 1), with a limit of 6GB (out of 20), java 1.8.0_73 x64, Windows Server 2008 R2 Standard and pushed the core configuration files. I have one zookeeper and one solr behind. I added some documents, but, as Per Steffensen mentioned, the processor was barely around 70% (with various spikes above this limit). After a while, I am _always_ getting 503 http status and the reply from solr is "Cannot talk to ZooKeeper - Updates are disabled.". Solr log: 2016-03-03 12:34:20.902 INFO (qtp1450821318-4031) [c:CORE s:shard1 r:core_node1 x:CORE_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [CORE_shard1_replica1] webapp=/solr path=/update params={}{} 0 0 2016-03-03 12:34:20.902 ERROR (qtp1450821318-4031) [c:CORE s:shard1 r:core_node1 x:CORE_shard1_replica1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1469) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:667) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:94) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Unknown Source) I have an automated tool that generates the xml documents that need to be pushed. And after I receive this error, after a while, I receive 404. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154302#comment-15154302 ] Abhijit Tilak commented on SOLR-3274: - I will be out of office on 2/19. Please email your queries or questions and I will try my best to address them as soon as possible. If you need immediate help, please contact Customer Hub manager: Juan I. Rodriguez. Thanks, Abhijit NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient(s). If you are not an intended recipient, please do not read, distribute, or take action in reliance upon this message. If you have received this in error, please notify me immediately by return email and promptly delete this message and its attachments from your computer. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its inter
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154297#comment-15154297 ] Scott Blum commented on SOLR-3274: -- Don't forget to rule out GC as the problem. Long GC pauses in Solr can cause the ZK session to timeout. There are probably some bugs in various places for getting back into a consistent state after that happens. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) sh
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153691#comment-15153691 ] Nitin Sharma commented on SOLR-3274: I also ran into this issue with solr 4.6.1. The main issue is that updates go through fine in solr but solr-zk interactions are broken and few shards are marked as down. They never recover/reconnect to zk unless force solr restart. I even tried bumping up zk ensemble to 7 and upto 9 nodes but did not help. Any suggestions on how to deal with this? > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymor
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123736#comment-15123736 ] Stephan Lagraulet commented on SOLR-3274: - I think that this message could be misleading according to what is happening to our servers: First we receive a disconnect event. {code} 2016-01-29 14:52:38.868 INFO (zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@478ca953 name:ZooKeeperConnection Watcher:solrnode013:2181,solrnode014:2181,solrnode015:2181 got event WatchedEvent state:Disconnected type:None path:null path:null type:None 2016-01-29 14:52:38.868 INFO (zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager zkClient has disconnected {code} Then we receive an "Expired" event: {code} 2016-01-29 14:52:40.028 INFO (zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@478ca953 name:ZooKeeperConnection Watcher:solrnode013:2181,solrnode014:2181,solrnode015:2181 got event WatchedEvent state:Expired type:None path:null path:null type:None 2016-01-29 14:52:40.028 INFO (zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... 2016-01-29 14:52:40.029 INFO (zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ] o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one... {code} Then the connexion is established: {code} 2016-01-29 14:52:40.393 INFO (zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@478ca953 name:ZooKeeperConnection Watcher:a01solrf013.cdweb.biz:2181,a01solrf014.cdweb.biz:2181,a01solrf015.cdweb.biz:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 2016-01-29 14:52:40.393 INFO (zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper 2016-01-29 14:52:40.393 INFO (zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread) [ ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper reestablished. 2016-01-29 14:52:40.393 INFO (zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread) [ ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing core states after session expiration. 2016-01-29 14:52:40.034 ERROR (qtp1057941451-80095) [c:offers_full s:shard3 r:core_node7 x:offers_full_shard3_replica2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1459) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:660) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186) > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121858#comment-15121858 ] Hal Deadman commented on SOLR-3274: --- We are also running SolrCloud 5.4.0 and we are seeing this issue in production today. I think it happened in the past and cleared itself up (unless the servers were rebooted without my knowledge) but we are in the process of getting rights to restart zookeeper and see if that clears it up. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and i
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121836#comment-15121836 ] Dan Kogan commented on SOLR-3274: - It appears we also ran into this issue using SolrCloud 5.4. This is the stack trace we saw: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1459) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:660) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:74) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:260) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:525) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:200) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exceptio
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979672#comment-14979672 ] Greg Pendlebury commented on SOLR-3274: --- FWIW we ran into this issue today as well, and nothing worked until ZK was restarted. I would love to think that Solr could detect this issue, but it smells like a ZK bug to me. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time >
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738200#comment-14738200 ] Alexander S. commented on SOLR-3274: Hi, just wanted to let you know that adding 2 new ZK servers (so I have 5 running ZK instances) improved the situation a lot. But I found one weird thing with the ZK: {code} java.net.UnknownHostException: zoo5.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zoo2.devops:3888 java.net.UnknownHostException: zoo2.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 1 at election address zoo1.devops:3888 java.net.UnknownHostException: zoo1.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,236 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 4 at election address zoo4.devops:3888 java.net.UnknownHostException: zoo4.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) {code} Just opened 2 ssh sessions to that server and was monitoring the log with tail. While ZK posted these errors I was able to ping zoo1/2/4/5.devops servers and was able to connect to ZK there with telnet. So it seems something could go wrong with ZK itself. At this time I seen these "cannot talk to ZK" errors in Solr. And eventually I've just restarted this broken ZK instance and everything is fine again. So I guess Solr tried to connect namely to this broken ZK instance (can't say for sure since it doesn't mention the instance it failed to connect to in its log). > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > un
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605355#comment-14605355 ] Arcadius Ahouansou commented on SOLR-3274: -- We have similar issue on Solr5.2.1 with the log entry below. We have 8 Solr nodes and 5 ZK nodes The 8 solr nodes are identical with only 1 collection, only 1 replica per node. The error {code}Cannot talk to ZooKeeper - Updates are disabled{code} is not too helpful IMHO. It would be good to also log all ZK nodes that Solr tried to connect to before throwing the error. {code} ERROR - 2015-06-29 00:42:04.912; [collectionA-01 shard1 core_node8 collectionA-01_shard1_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1482) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1602) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) {code} > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it i
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094033#comment-14094033 ] Per Steffensen commented on SOLR-3274: -- bq. Both nodes have 16 CPU cores, 48G of memory and RAID 10 (SSD), I thought it would be hard to get performance issues there Yes that should be hard. Well done! :-) bq. Anyway, adding a separate node with 4th zookeeper instance might help, right? A ZK cluster should always have an uneven number of nodes. So if you want to add additional ZK instances you should add two. I would rather move the two ZK instances running on Solr-machines to two machines not running Solr. So that you end up with 3 ZK instances where non of them run on machines also running Solr. We never run ZK on the same machines as Solr - we have bad experiences with that - loosing ZK connections all the time. You will still occasionally loose ZK connections from Solrs when they are under high load, but usually they reconnect fairly quickly (before session timeout) and you can continue immediately. I have been working on an optimized ZK where you do not loose ZK connections nearly as often, but currently it is not prioritized to finish the job. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(Ht
Re: [jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
bq: Anyway, adding a separate node with 4th zookeeper instance might help, right? no. The formula for a quorum is (num_zookeeper_nodes)/2 + 1. So adding a fourth node requires that _three_ of them be up, i.e. only one can be unreachable. Which is the same number as with 4. It actually makes failure _more_ likely to have an even number of ZK instances. bq: ...since they share same nodes with Solr instances As separate processes? Or embedded? If the latter, the cure is obvious. If the former, consider running the ZK instances on other nodes perhaps... Best, Erick On Mon, Aug 11, 2014 at 8:28 AM, Alexander S. (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092884#comment-14092884 > ] > > Alexander S. commented on SOLR-3274: > > > Hi, thanks for the response. > > bq. Well you never know > I've checked nodes status, that 3rd node was online all the time and there > were no any load on it. > > bq. In a 3-node ZK-cluster you need at least 2 healthy ZK-nodes connected > with each other for the cluster to be operational. > That should be the problem since 2 other ZK instances might be > (theoretically) unavailable because of heavy load (since they share same > nodes with Solr instances). Both nodes have 16 CPU cores, 48G of memory and > RAID 10 (SSD), I thought it would be hard to get performance issues there. > Anyway, adding a separate node with 4th zookeeper instance might help, > right? > > > ZooKeeper related SolrCloud problems > > > > > > Key: SOLR-3274 > > URL: https://issues.apache.org/jira/browse/SOLR-3274 > > Project: Solr > > Issue Type: Bug > > Components: SolrCloud > >Affects Versions: 4.0-ALPHA > > Environment: Any > >Reporter: Per Steffensen > > > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we > have 7 Solr servers, running 28 slices of the same collection (collA) - all > slices have one replica (two shards all in all - leader + replica) - 56 > cores all in all (8 shards on each solr instance). But anyways... > > Besides the problem reported in SOLR-3273, the system seems to run fine > under high load for several hours, but eventually errors like the ones > shown below start to occur. I might be wrong, but they all seem to indicate > some kind of unstability in the collaboration between Solr and ZooKeeper. I > have to say that I havnt been there to check ZooKeeper "at the moment where > those exception occur", but basically I dont believe the exceptions occur > because ZooKeeper is not running stable - at least when I go and check > ZooKeeper through other "channels" (e.g. my eclipse ZK plugin) it is always > accepting my connection and generally seems to be doing fine. > > Exception 1) Often the first error we see in solr.log is something like > this > > {code} > > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > > at > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > at org.mortbay.jetty.Server.handle(Server.java
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092884#comment-14092884 ] Alexander S. commented on SOLR-3274: Hi, thanks for the response. bq. Well you never know I've checked nodes status, that 3rd node was online all the time and there were no any load on it. bq. In a 3-node ZK-cluster you need at least 2 healthy ZK-nodes connected with each other for the cluster to be operational. That should be the problem since 2 other ZK instances might be (theoretically) unavailable because of heavy load (since they share same nodes with Solr instances). Both nodes have 16 CPU cores, 48G of memory and RAID 10 (SSD), I thought it would be hard to get performance issues there. Anyway, adding a separate node with 4th zookeeper instance might help, right? > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which mea
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092864#comment-14092864 ] Per Steffensen commented on SOLR-3274: -- bq. That's simply impossible for all 3 zookeeper instances to get offline simultaneously. Well you never know bq. Since there's always at least 1 stable ZK node this seems like a communication/reliability bug in Solr. In a 3-node ZK-cluster you need at least 2 healthy ZK-nodes connected with each other for the cluster to be operational. A majority of the nodes always need to agree for an operation to be carried out - this way you know that at any time only one set of ZK-nodes in a ZK-cluster can successfully carry out operations - e.g. when there is no network connection between two sets of ZK-nodes (but connections internally between the nodes in each set are ok), only one set can contain a majority of the total number of ZK-nodes in the cluster. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThrea
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090519#comment-14090519 ] Alexander S. commented on SOLR-3274: Suffering from the same problem, happens during high load on the nodes. Our setup is pretty simple, 4 nodes: 2 shards, 2 replicas and 3 zookeeper instance. Everything is running on 3 physical nodes: * 1st node — 1 zookeeper instance * 2nd node — 2 shards and 1 zookeeper * 3rd node — 2 replicas and 1 zookeeper And running solr instances this way: java -Xms2G -Xmx16G -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -DzkHost=zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf -Dbootstrap_conf=true -DnumShards=2 -jar start.jar etc/jetty.xml And once loading increases we get: {code} org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1306) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:981) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) {code} That's simply impossible for all 3 zookeeper instances to get offline simultaneously. I understand that 2nd and 3rd nodes could be overloaded because of Solr, but 1st node runs just a single zookeeper instance and the load average on that node is close to zero. Since there's always at least 1 stable ZK node this seems like a communication/reliability bug in Solr. > ZooKeeper related SolrCloud problems > --
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494029#comment-13494029 ] Jay Hacker commented on SOLR-3274: -- Not sure if it's the same problem, but I have seen similar issues with 4.0.0 release. I get errors like: {code} ClusterState says we are the leader, but locally we don't think so There was a problem finding the leader in zk forwarding update to http://solr83:4000/solr/main/ failed - retrying ... Cannot open channel to 3 at election address solr84/X.X.X.X:5002 Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect {code} I'm running zookeeper embedded, and the problem turns out to be long garbage collection pauses. During a stop-the-world collection, zookeeper times out. It's especially bad if the system has to page in a bunch of memory from disk. This would explain why things run fine for a while, until memory fills up and you need to do a big GC. This is quite repeatable for us; just index until memory is pretty full, wait for a long GC or trigger one manually with VisualVM. You can try different garbage collectors or specifying maximum pause times (I've had some luck with {{-XX:+UseConcMarkSweepGC}} ), but the best solution may be to run zookeeper in an independent JVM. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpCo
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241186#comment-13241186 ] Mark Miller commented on SOLR-3274: --- If you don't solve the issue of the zk expirations, this is no real surprise. The larger the index gets, the longer the recoveries can take - until you end up in a similar situation as you had. The key is understanding why the connection to zookeeper is dropping. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like th
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241121#comment-13241121 ] Per Steffensen commented on SOLR-3274: -- Our performance test (with adminPath="/admin/cores") ran successfully for a while, but then simular errors started to occur, and it seems like whenever a SolrCloud-cluster (of some of the Solr instances in it) gets into "no contant to ZK"-state, it has a hard time getting out of this state again. The CPU is like 70-80 idle on the machines while this is going on, so I have a hard time recognizing that ZK is not responding within 10 secs. But basically I cannot really pinpoint what goes wrong yet, and we will shift focus for a while, so it will probably be a while before I might get back with solid proof or concrete cases. Only thing I can do for now is try to encourage you guys at apache-solr to do your own stability/robustness/endurance tests where you run with a fairly high concurrent load (not so high that the machines get exhausted) for many days, and hopefully you will see the problems occur yourselves. Thanks for your collaboration! Regards, Per Steffensen > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpPars
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239208#comment-13239208 ] Per Steffensen commented on SOLR-3274: -- Ahhh I see. So correcting the adminPath will allow it to allow it to reconnect AND recover and then get back to work. Of course, we didnt I conclude that myself. We will try to correct adminPath and run the performance test again and see how it goes. If we see any problems we will report here and try to pinpoint (and potentially fix). Thanks a lot for your help! Regards, Per Steffensen > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of >
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238455#comment-13238455 ] Mark Miller commented on SOLR-3274: --- {quote} But why not just try to reconnect if/when this situation has occured, so that Solr can continue doing its work? I guess Solr does not do that, because it seems like when this error has first established, there is no "recovering", and certainly (Im close to 100% positive) ZK will not continue doing 10+ secs response-times to all requests, even though it might do a 10+ sec response once in a while. {quote} Solr does try to reconnect - but there can be no recovering due to the other issue you posted - because you have changed the core admin url. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238431#comment-13238431 ] Mark Miller commented on SOLR-3274: --- bq. U 10 secs is A LOT OF TIME It really depends - I've seen that timeout broken on a heavily loaded machine more than a few times. Then you have to add in any network delays. But yeah, on a fast machine under normal to high load, I have not really run into a problem with this timeout. bq. Then basically my options are to setup a more responsive ZK cluster or maybe raise the ZK timeout on Solr side. That's all I can suggest. If the ZooKeeper client loses the connection, it has up to the session timeout to reconnect. Once it reconnects, if more than the session timeout has passed, you will get the SessionExpiredException. If that happens, the node will go into recovery. If it's in recovery, it won't serve search requests until recovery is finished - so that could also contribute to the "no servers hosting shard" issue. Let me know how it goes and if you can pin point any problems. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238428#comment-13238428 ] Per Steffensen commented on SOLR-3274: -- But why not just try to reconnect if/when this situation has occured, so that Solr can continue doing its work? I guess Solr does not do that, because it seems like when this error has first established, there is no "recovering", and certainly (Im close to 100% positive) ZK will not continue doing 10+ secs response-times to all requests, even though it might do a 10+ sec response once in a while. Regards, Per Steffensen > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238419#comment-13238419 ] Per Steffensen commented on SOLR-3274: -- U 10 secs is A LOT OF TIME. I really wouldnt want to set it higher that that. If ZK is not able to answer within 10 secs I need to correct something else in my setup. I still believe that Solr might end in this state (where it "believes" that the connection to ZK is lost) some other way than actually experiencing a 10+ sec response-time from ZK, but I cant prove it (yet). So for now I will just thank you for your kind help, and assume that it is correct. Then basically my options are to setup a more responsive ZK cluster or maybe raise the ZK timeout on Solr side. Thanks, again. Regards, Per Steffensen > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected >
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238409#comment-13238409 ] Mark Miller commented on SOLR-3274: --- bq. not the same smoking gun. Sorry - actually this does make sense with the other errors - if the zk connection is lost, that node is no longer considered live - if that happens to each node hosting a shard (say you have 1 replica and this happened to both nodes) then searches would fail with this. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Ex
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238404#comment-13238404 ] Mark Miller commented on SOLR-3274: --- bq. Can all the exception be explained by "connection loss between solr and zookeeper"? bq. SessionExpiredException This indicates the connection with ZooKeeper was lost. bq. org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. If there is no connection to ZooKeeper, you will see this if you send an update. bq. org.apache.solr.common.SolrException: no servers hosting shard: Sami Siren has a JIRA issue about improving this message I believe - but normally it means that the cluster does not see a single node hosting a given shard. Not sure if this is related to the above - not the same smoking gun. bq. Can you point me in the direction of how to set it manually? The default is only 10 seconds. I'd try 30 seconds perhaps? You don't want it too low, but you also don't want it too high if you can help it. I can't remember what the "zookeeper" default is, but I've seen it set as high as 60 seconds looking around some hbase usage... You should be able to set it in solr.xml as a cores attribute: zkClientTimeout="3" or whatever. That is:ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseN
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238391#comment-13238391 ] Per Steffensen commented on SOLR-3274: -- Thanks a lot, Mark! Can all the exception be explained by "connection loss between solr and zookeeper"? Im not sure I totally buy the explanation because I believe that, even though there is a fairly high update/search-load on the machines in the cluster, the machines actually do not seem to be exhausted (CPU idle way above 0% (more like 50% in average), not very high IO-wait etc.). So I would expect plenty of resources to be available for ZK to respond fast. But lets see what happens if we set the timeout higher. Can you point me in the direction of how to set it manually? Regards, Per Steffensen > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen >Assignee: Mark Miller > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238381#comment-13238381 ] Mark Miller commented on SOLR-3274: --- This happens because the connection between solr and zookeeper is lost - perhaps because the load on the box is too high. I think we may default to a fairly low timeout that could be raised (by default and manually). > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception