Re: why does a node switch state ?
See: https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists -- Jack Krupansky -Original Message- From: veena rani Sent: Thursday, August 29, 2013 12:18 AM To: solr-user@lucene.apache.org Subject: Re: why does a node switch state ? Kindly stop me from solr mail chain. Thanks and regards, Veena On Wed, Aug 28, 2013 at 12:55 PM, sling wrote: hi, I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). 1000 000 docs are indexed per day, and 10 query requests per second, and sometimes, maybe there are 100 query requests per second. in each shard, one jvm has 8G ram, and another has 5G. the jvm args is like this: -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC -Xloggc:log/jvmsolr.log OR -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps -Xloggc:log/jvmsolr.log Nodes works well, but also switch state every day (at the same time, gc becomes abnormal like below). 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K), 0.0099250 secs] 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K), 0.0124890 secs] 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K), 0.0695530 secs] 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K), 0.1374600 secs] 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs] 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320 secs] 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270 secs] 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870 secs] 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380 secs] 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500 secs] 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC 4174941K->2127315K(4608000K), 6.3435150 secs] 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180 secs] 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650 secs] even more, sometimes a shard is down(one node is recovering, another is down), that is an absolute disaster... please help me. any advice is welcome... -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Veena Rani P N Banglore. 9538440458
Re: why does a node switch state ?
Kindly stop me from solr mail chain. Thanks and regards, Veena On Wed, Aug 28, 2013 at 12:55 PM, sling wrote: > hi, > I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). > 1000 000 docs are indexed per day, and 10 query requests per second, and > sometimes, maybe there are 100 query requests per second. > > in each shard, one jvm has 8G ram, and another has 5G. > > the jvm args is like this: > -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 > -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC > -Xloggc:log/jvmsolr.log > OR > -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8 > -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 > -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps > -Xloggc:log/jvmsolr.log > > Nodes works well, but also switch state every day (at the same time, gc > becomes abnormal like below). > > 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K), > 0.0099250 secs] > 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K), > 0.0124890 secs] > 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K), > 0.0695530 secs] > 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K), > 0.1374600 secs] > 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs] > 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320 > secs] > 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270 > secs] > 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870 > secs] > 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380 > secs] > 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500 > secs] > 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC > 4174941K->2127315K(4608000K), 6.3435150 secs] > 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180 > secs] > 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650 > secs] > > even more, sometimes a shard is down(one node is recovering, another is > down), that is an absolute disaster... > > please help me. any advice is welcome... > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Veena Rani P N Banglore. 9538440458
Re: why does a node switch state ?
Hi Daniel, thank you very much for your reply. However, my zkTimeout in solr.xml is 30s. ... -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939p4087142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why does a node switch state ?
Do you see anything in the solr logs as to what the trigger for your nodes changing state was? You should see some kind of error/warning before the election is triggered. My gut feeling would be loss of communication between your leader and ZK (possibly by a GC event that locks the JVM for a while) but that's pure conjecture given you haven't given a lot of information. What is your ZK timeout? You are seeing a 6s GC event, so if that is locking the JVM for that long, and your ZK timeout is less than that, it is likely that ZK thinks that node has gone away, so it forces an election to find a new leader. But there should be evident of that in the logs, you should see the ZK connection drop. On 28 August 2013 08:25, sling wrote: > hi, > I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). > 1000 000 docs are indexed per day, and 10 query requests per second, and > sometimes, maybe there are 100 query requests per second. > > in each shard, one jvm has 8G ram, and another has 5G. > > the jvm args is like this: > -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4 > -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 > -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC > -Xloggc:log/jvmsolr.log > OR > -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8 > -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 > -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps > -Xloggc:log/jvmsolr.log > > Nodes works well, but also switch state every day (at the same time, gc > becomes abnormal like below). > > 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K), > 0.0099250 secs] > 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K), > 0.0124890 secs] > 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K), > 0.0695530 secs] > 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K), > 0.1374600 secs] > 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs] > 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320 > secs] > 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270 > secs] > 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870 > secs] > 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380 > secs] > 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500 > secs] > 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC > 4174941K->2127315K(4608000K), 6.3435150 secs] > 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180 > secs] > 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650 > secs] > > even more, sometimes a shard is down(one node is recovering, another is > down), that is an absolute disaster... > > please help me. any advice is welcome... > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html > Sent from the Solr - User mailing list archive at Nabble.com. >
why does a node switch state ?
hi, I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). 1000 000 docs are indexed per day, and 10 query requests per second, and sometimes, maybe there are 100 query requests per second. in each shard, one jvm has 8G ram, and another has 5G. the jvm args is like this: -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC -Xloggc:log/jvmsolr.log OR -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps -Xloggc:log/jvmsolr.log Nodes works well, but also switch state every day (at the same time, gc becomes abnormal like below). 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K), 0.0099250 secs] 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K), 0.0124890 secs] 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K), 0.0695530 secs] 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K), 0.1374600 secs] 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs] 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320 secs] 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270 secs] 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870 secs] 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380 secs] 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500 secs] 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC 4174941K->2127315K(4608000K), 6.3435150 secs] 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180 secs] 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650 secs] even more, sometimes a shard is down(one node is recovering, another is down), that is an absolute disaster... please help me. any advice is welcome... -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html Sent from the Solr - User mailing list archive at Nabble.com.