Re: Frequent recovery of nodes in SolrCloud
Also, the PingRequestHandler is configured as: server-enabled.txt On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale wrote: > From ZooKeeper side, we have following configuration: > tickTime=2000 > dataDir=/var/lib/zookeeper > clientPort=2181 > initLimit=5 > syncLimit=2 > server.1=192.168.70.27:2888:3888 > server.2=192.168.70.64:2889:3889 > server.3=192.168.70.26:2889:3889 > > Also, in solr.xml, we have zkClientTimeout set to 3. > > On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson > wrote: > >> And what is your zookeeper timeout? When it's too short that can lead >> to this behavior. >> >> Best, >> Erick >> >> On Thu, Oct 16, 2014 at 4:34 PM, "Jürgen Wagner (DVT)" >> wrote: >> > Hello, >> > you have one shard and 11 replicas? Hmm... >> > >> > - Why you have to keep two nodes on some machines? >> > - Physical hardware or virtual machines? >> > - What is the size of this index? >> > - Is this all on a local network or are there links with potential >> outages >> > or failures in between? >> > - What is the query load? >> > - Have you had a look at garbage collection? >> > - Do you use the internal Zookeeper? >> > - How many nodes? >> > - Any observers? >> > - What kind of load does Zookeeper show? >> > - How much RAM do these nodes have available? >> > - Do some servers get into swapping? >> > - ... >> > >> > How about some more details in terms of sizing and topology? >> > >> > Cheers, >> > --Jürgen >> > >> > >> > On 16.10.2014 18:41, sachinpkale wrote: >> > >> > Hi, >> > >> > Recently we have shifted to SolrCloud (4.10.1) from traditional >> Master-Slave >> > configuration. We have only one collection and it has only only one >> shard. >> > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we >> have >> > two instances running on each) out of which one is leader. >> > >> > Whenever I see the cluster status using http://:/solr/#/~cloud, >> it >> > shows at least one (sometimes, it is 2-3) node status as recovering. We >> are >> > using HAProxy load balancer and there also many times, it is showing the >> > nodes are recovering. This is happening for all nodes in the cluster. >> > >> > What would be the problem here? How do I check this in logs? >> > >> > >> > >> > -- >> > View this message in context: >> > >> http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html >> > Sent from the Solr - User mailing list archive at Nabble.com. >> > >> > >> > >> > -- >> > >> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С >> > уважением >> > i.A. Jürgen Wagner >> > Head of Competence Center "Intelligence" >> > & Senior Cloud Consultant >> > >> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany >> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 >> 1543 >> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de >> > >> > >> > Managing Board: Jürgen Hatzipantelis (CEO) >> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: >> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 >> > >> > >> > >
Re: Frequent recovery of nodes in SolrCloud
>From ZooKeeper side, we have following configuration: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.70.27:2888:3888 server.2=192.168.70.64:2889:3889 server.3=192.168.70.26:2889:3889 Also, in solr.xml, we have zkClientTimeout set to 3. On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson wrote: > And what is your zookeeper timeout? When it's too short that can lead > to this behavior. > > Best, > Erick > > On Thu, Oct 16, 2014 at 4:34 PM, "Jürgen Wagner (DVT)" > wrote: > > Hello, > > you have one shard and 11 replicas? Hmm... > > > > - Why you have to keep two nodes on some machines? > > - Physical hardware or virtual machines? > > - What is the size of this index? > > - Is this all on a local network or are there links with potential > outages > > or failures in between? > > - What is the query load? > > - Have you had a look at garbage collection? > > - Do you use the internal Zookeeper? > > - How many nodes? > > - Any observers? > > - What kind of load does Zookeeper show? > > - How much RAM do these nodes have available? > > - Do some servers get into swapping? > > - ... > > > > How about some more details in terms of sizing and topology? > > > > Cheers, > > --Jürgen > > > > > > On 16.10.2014 18:41, sachinpkale wrote: > > > > Hi, > > > > Recently we have shifted to SolrCloud (4.10.1) from traditional > Master-Slave > > configuration. We have only one collection and it has only only one > shard. > > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we > have > > two instances running on each) out of which one is leader. > > > > Whenever I see the cluster status using http://:/solr/#/~cloud, > it > > shows at least one (sometimes, it is 2-3) node status as recovering. We > are > > using HAProxy load balancer and there also many times, it is showing the > > nodes are recovering. This is happening for all nodes in the cluster. > > > > What would be the problem here? How do I check this in logs? > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > -- > > > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > > уважением > > i.A. Jürgen Wagner > > Head of Competence Center "Intelligence" > > & Senior Cloud Consultant > > > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 > 1543 > > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de > > > > > > Managing Board: Jürgen Hatzipantelis (CEO) > > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > > > >
Re: Frequent recovery of nodes in SolrCloud
- Why you have to keep two nodes on some machines? - These are very powerful machines (32-Core, 64GB) and our index size is 1GB. We are allocating 7GB to JVM, so we thought it would be OK to have two instances on the same machine. - Physical hardware or virtual machines? - Physical hardware - What is the size of this index? - 1GB - Is this all on a local network or are there links with potential outages or failures in between? - local network - What is the query load? - 10K requests per minute. - Have you had a look at garbage collection? - GC time is generally 5-10%. I have attached a screenshot. - Do you use the internal Zookeeper? - No. We have setup external Zookeeper ensemble with 3 instances. Following is the ZooKeeper configuration: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.70.27:2888:3888 server.2=192.168.70.64:2889:3889 server.3=192.168.70.26:2889:3889 Also, in solr.xml, we have zkClientTimeout set to 3. - How many nodes? - 3 - Any observers? - I don't know what observers are. Can you please explain? - What kind of load does Zookeeper show? - Load is normal I guess. Need to double-check. - How much RAM do these nodes have available? - Each SOLR node has 7GB allocated. For ZooKeeper, we have not allocated the memory explicitly. - Do some servers get into swapping? - Not sure. How do I check that? On Fri, Oct 17, 2014 at 2:04 AM, "Jürgen Wagner (DVT)" < juergen.wag...@devoteam.com> wrote: > Hello, > you have one shard and 11 replicas? Hmm... > > - Why you have to keep two nodes on some machines? > - Physical hardware or virtual machines? > - What is the size of this index? > - Is this all on a local network or are there links with potential outages > or failures in between? > - What is the query load? > - Have you had a look at garbage collection? > - Do you use the internal Zookeeper? > - How many nodes? > - Any observers? > - What kind of load does Zookeeper show? > - How much RAM do these nodes have available? > - Do some servers get into swapping? > - ... > > How about some more details in terms of sizing and topology? > > Cheers, > --Jürgen > > > On 16.10.2014 18:41, sachinpkale wrote: > > Hi, > > Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave > configuration. We have only one collection and it has only only one shard. > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have > two instances running on each) out of which one is leader. > > Whenever I see the cluster status using http://:/solr/#/~cloud, it > shows at least one (sometimes, it is 2-3) node status as recovering. We are > using HAProxy load balancer and there also many times, it is showing the > nodes are recovering. This is happening for all nodes in the cluster. > > What would be the problem here? How do I check this in logs? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > -- > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > уважением > *i.A. Jürgen Wagner* > Head of Competence Center "Intelligence" > & Senior Cloud Consultant > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de > -- > Managing Board: Jürgen Hatzipantelis (CEO) > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > >
Manual leader election in SolrCloud
Is it possible to elect the leader manually in SOLR Cloud 4.10.1? -Sachin-
Re: Master-Slave setup using SolrCloud
Apparently, there is a bug in Solr 4.10.0 which was causing the NullPointerExceptions. SOLR-6501 <https://issues.apache.org/jira/browse/SOLR-6501> We have updated our production SOLR to 4.10.1 On Thu, Oct 2, 2014 at 8:13 PM, Sachin Kale wrote: > If I look into the logs, many times I get only following line without any > stacktrace: > > *ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException; > java.lang.NullPointerException* > > These exceptions are not coming continuously. Once in every 10-15 minutes. > But once it starts, there are continuous 800-1000 such exceptions one after > another. Is it related to cache warmup? > > I can provide following information regarding the setup: > We are now on using Solr 4.10.0 > Memory allocated to each SOLR instance is 7GB. I guess it is more than > sufficient for 1 GB index, right? > Indexes are stored as normal, local filesystem. > I am using three caches: > Query Cache: Size 4096, autoWarmCount 2048 > Filter cache: size 8192, autoWarmCount 4096 > Document cache: size 4096 > > I am experimenting with commitMaxTime for both soft and hard commits > > After referring following: > > http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Hence, I set following: > > > ${solr.autoCommit.maxTime:6} > false > > > > ${solr.autoSoftCommit.maxTime:90} > > > Also, we are getting following warnings many times: > > *java.lang.NumberFormatException: For input string: "5193.0"* > > Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we > pointed it to the same index we were using for 4.4.0 > > On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey wrote: > >> On 10/2/2014 6:58 AM, Sachin Kale wrote: >> > We are trying to move our traditional master-slave Solr configuration to >> > SolrCloud. As our index size is very small (around 1 GB), we are having >> > only one shard. >> > So basically, we are having same master-slave configuration with one >> leader >> > and 6 replicas. >> > We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. >> > Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 >> minute >> > (Let me know if these values does not make sense). >> > >> > Caches are set such that warmup time is at most 20 seconds. >> > >> > We are having continuous indexing requests mostly for updating the >> existing >> > documents. Few requests are for deleting/adding the documents. >> > >> > The problem we are facing is that we are getting very frequent >> > NullPointerExceptions. >> > We get continuous 200-300 such exceptions within a period of 30 seconds >> and >> > for next few minutes, it works fine. >> > >> > Stacktrace of NullPointerException: >> > >> > *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; >> > null:java.lang.NullPointerException* >> > *at >> > >> org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* >> > *at >> > >> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* >> > *at >> > >> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* >> > >> > >> > I am not sure what would be causing it. My guess, whenever, it is >> trying to >> > replay tlog, we are getting these exceptions. Is anything wrong in my >> > configuration? >> >> Your automatic commit settings are fine. If you had tried to use a very >> small maxTime like 1000 (1 second), I would tell you that it's probably >> too short. >> >> The tlogs only get replayed when a core is first started or reloaded. >> These appear to be errors during queries, having nothing at all to do >> with indexing. >> >> I can't be sure with the available information (no Solr version, >> incomplete stacktrace, no info about what request caused and received >> the error), but if I had to guess, I'd say you probably changed your >> schema so that certain fields are now required that weren't required >> before, and didn't reindex, so those fields are not present on every >> document. Or it might be that you added a uniqueKey and didn't reindex, >> and that field is not present on every document. >> >> http://wiki.apache.org/solr/HowToReindex >> >> Thanks, >> Shawn >> >> >
Re: Master-Slave setup using SolrCloud
If I look into the logs, many times I get only following line without any stacktrace: *ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException; java.lang.NullPointerException* These exceptions are not coming continuously. Once in every 10-15 minutes. But once it starts, there are continuous 800-1000 such exceptions one after another. Is it related to cache warmup? I can provide following information regarding the setup: We are now on using Solr 4.10.0 Memory allocated to each SOLR instance is 7GB. I guess it is more than sufficient for 1 GB index, right? Indexes are stored as normal, local filesystem. I am using three caches: Query Cache: Size 4096, autoWarmCount 2048 Filter cache: size 8192, autoWarmCount 4096 Document cache: size 4096 I am experimenting with commitMaxTime for both soft and hard commits After referring following: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Hence, I set following: ${solr.autoCommit.maxTime:6} false ${solr.autoSoftCommit.maxTime:90} Also, we are getting following warnings many times: *java.lang.NumberFormatException: For input string: "5193.0"* Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we pointed it to the same index we were using for 4.4.0 On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey wrote: > On 10/2/2014 6:58 AM, Sachin Kale wrote: > > We are trying to move our traditional master-slave Solr configuration to > > SolrCloud. As our index size is very small (around 1 GB), we are having > > only one shard. > > So basically, we are having same master-slave configuration with one > leader > > and 6 replicas. > > We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. > > Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 > minute > > (Let me know if these values does not make sense). > > > > Caches are set such that warmup time is at most 20 seconds. > > > > We are having continuous indexing requests mostly for updating the > existing > > documents. Few requests are for deleting/adding the documents. > > > > The problem we are facing is that we are getting very frequent > > NullPointerExceptions. > > We get continuous 200-300 such exceptions within a period of 30 seconds > and > > for next few minutes, it works fine. > > > > Stacktrace of NullPointerException: > > > > *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; > > null:java.lang.NullPointerException* > > *at > > > org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* > > *at > > > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* > > *at > > > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* > > > > > > I am not sure what would be causing it. My guess, whenever, it is trying > to > > replay tlog, we are getting these exceptions. Is anything wrong in my > > configuration? > > Your automatic commit settings are fine. If you had tried to use a very > small maxTime like 1000 (1 second), I would tell you that it's probably > too short. > > The tlogs only get replayed when a core is first started or reloaded. > These appear to be errors during queries, having nothing at all to do > with indexing. > > I can't be sure with the available information (no Solr version, > incomplete stacktrace, no info about what request caused and received > the error), but if I had to guess, I'd say you probably changed your > schema so that certain fields are now required that weren't required > before, and didn't reindex, so those fields are not present on every > document. Or it might be that you added a uniqueKey and didn't reindex, > and that field is not present on every document. > > http://wiki.apache.org/solr/HowToReindex > > Thanks, > Shawn > >
Master-Slave setup using SolrCloud
Hello, We are trying to move our traditional master-slave Solr configuration to SolrCloud. As our index size is very small (around 1 GB), we are having only one shard. So basically, we are having same master-slave configuration with one leader and 6 replicas. We are experimenting with maxTime of both AutoCommit and AutoSoftCommit. Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute (Let me know if these values does not make sense). Caches are set such that warmup time is at most 20 seconds. We are having continuous indexing requests mostly for updating the existing documents. Few requests are for deleting/adding the documents. The problem we are facing is that we are getting very frequent NullPointerExceptions. We get continuous 200-300 such exceptions within a period of 30 seconds and for next few minutes, it works fine. Stacktrace of NullPointerException: *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException; null:java.lang.NullPointerException* *at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)* *at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)* *at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)* I am not sure what would be causing it. My guess, whenever, it is trying to replay tlog, we are getting these exceptions. Is anything wrong in my configuration? -Sachin-