Re: why does a node switch state ?

2013-08-29 Thread Jack Krupansky

See:
https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists

-- Jack Krupansky

-Original Message- 
From: veena rani

Sent: Thursday, August 29, 2013 12:18 AM
To: solr-user@lucene.apache.org
Subject: Re: why does a node switch state ?

Kindly stop me from solr mail chain.

Thanks and regards,
Veena



On Wed, Aug 28, 2013 at 12:55 PM, sling  wrote:


hi,
I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard).
1000 000 docs are indexed per day, and 10 query requests per second, and
sometimes, maybe there are 100 query requests per second.

in each shard, one jvm has 8G ram, and another has 5G.

the jvm args is like this:
-Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
-XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4
-XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
-XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC
-Xloggc:log/jvmsolr.log
OR
-Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
-XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8
-XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
-XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps
-Xloggc:log/jvmsolr.log

Nodes works well, but also switch state every day (at the same time, gc
becomes abnormal like below).

2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K),
0.0099250 secs]
2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K),
0.0124890 secs]
2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K),
0.0695530 secs]
2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K),
0.1374600 secs]
97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs]
2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320
secs]
2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270
secs]
2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870
secs]
2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380
secs]
2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500
secs]
2013-08-28T13:31:40.227+0800: 97303.019: [Full GC
4174941K->2127315K(4608000K), 6.3435150 secs]
2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180
secs]
2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650
secs]

even more, sometimes a shard is down(one node is recovering, another is
down), that is an absolute disaster...

please help me.   any advice is welcome...



--
View this message in context:
http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,
Veena Rani P N
Banglore.
9538440458 



Re: why does a node switch state ?

2013-08-28 Thread veena rani
Kindly stop me from solr mail chain.

Thanks and regards,
Veena



On Wed, Aug 28, 2013 at 12:55 PM, sling  wrote:

> hi,
> I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard).
> 1000 000 docs are indexed per day, and 10 query requests per second, and
> sometimes, maybe there are 100 query requests per second.
>
> in each shard, one jvm has 8G ram, and another has 5G.
>
> the jvm args is like this:
> -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
> -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4
> -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
> -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC
> -Xloggc:log/jvmsolr.log
> OR
> -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
> -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
> -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps
> -Xloggc:log/jvmsolr.log
>
> Nodes works well, but also switch state every day (at the same time, gc
> becomes abnormal like below).
>
> 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K),
> 0.0099250 secs]
> 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K),
> 0.0124890 secs]
> 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K),
> 0.0695530 secs]
> 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K),
> 0.1374600 secs]
> 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs]
> 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320
> secs]
> 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270
> secs]
> 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870
> secs]
> 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380
> secs]
> 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500
> secs]
> 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC
> 4174941K->2127315K(4608000K), 6.3435150 secs]
> 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180
> secs]
> 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650
> secs]
>
> even more, sometimes a shard is down(one node is recovering, another is
> down), that is an absolute disaster...
>
> please help me.   any advice is welcome...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Veena Rani P N
Banglore.
9538440458


Re: why does a node switch state ?

2013-08-28 Thread sling
Hi Daniel, thank you very much for your reply.
However, my zkTimeout in solr.xml is 30s.


...








--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939p4087142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: why does a node switch state ?

2013-08-28 Thread Daniel Collins
Do you see anything in the solr logs as to what the trigger for your nodes
changing state was?  You should see some kind of error/warning before the
election is triggered.  My gut feeling would be loss of communication
between your leader and ZK (possibly by a GC event that locks the JVM for a
while) but that's pure conjecture given you haven't given a lot of
information.

What is your ZK timeout?  You are seeing a 6s GC event, so if that is
locking the JVM for that long, and your ZK timeout is less than that, it is
likely that ZK thinks that node has gone away, so it forces an election to
find a new leader.  But there should be evident of that in the logs, you
should see the ZK connection drop.


On 28 August 2013 08:25, sling  wrote:

> hi,
> I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard).
> 1000 000 docs are indexed per day, and 10 query requests per second, and
> sometimes, maybe there are 100 query requests per second.
>
> in each shard, one jvm has 8G ram, and another has 5G.
>
> the jvm args is like this:
> -Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
> -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4
> -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
> -XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC
> -Xloggc:log/jvmsolr.log
> OR
> -Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
> -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8
> -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
> -XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps
> -Xloggc:log/jvmsolr.log
>
> Nodes works well, but also switch state every day (at the same time, gc
> becomes abnormal like below).
>
> 2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K),
> 0.0099250 secs]
> 2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K),
> 0.0124890 secs]
> 2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K),
> 0.0695530 secs]
> 2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K),
> 0.1374600 secs]
> 97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs]
> 2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320
> secs]
> 2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270
> secs]
> 2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870
> secs]
> 2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380
> secs]
> 2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500
> secs]
> 2013-08-28T13:31:40.227+0800: 97303.019: [Full GC
> 4174941K->2127315K(4608000K), 6.3435150 secs]
> 2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180
> secs]
> 2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650
> secs]
>
> even more, sometimes a shard is down(one node is recovering, another is
> down), that is an absolute disaster...
>
> please help me.   any advice is welcome...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


why does a node switch state ?

2013-08-28 Thread sling
hi,
I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard).
1000 000 docs are indexed per day, and 10 query requests per second, and
sometimes, maybe there are 100 query requests per second.

in each shard, one jvm has 8G ram, and another has 5G.

the jvm args is like this:
-Xmx5000m -Xms5000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
-XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=4
-XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
-XX:+UseCMSCompactAtFullCollection -XX:+PrintGCDateStamps -XX:+PrintGC
-Xloggc:log/jvmsolr.log
OR
-Xmx8000m -Xms8000m -Xmn2500m -Xss1m -XX:PermSize=128m -XX:MaxPermSize=128m
-XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:ParallelGCThreads=8
-XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
-XX:+UseCMSCompactAtFullCollection -XX:+PrintGC -XX:+PrintGCDateStamps
-Xloggc:log/jvmsolr.log

Nodes works well, but also switch state every day (at the same time, gc
becomes abnormal like below).  

2013-08-28T13:29:39.140+0800: 97180.866: [GC 3770296K->2232626K(4608000K),
0.0099250 secs]
2013-08-28T13:30:09.324+0800: 97211.050: [GC 3765732K->2241711K(4608000K),
0.0124890 secs]
2013-08-28T13:30:29.777+0800: 97231.504: [GC 3760694K->2736863K(4608000K),
0.0695530 secs]
2013-08-28T13:31:02.887+0800: 97264.613: [GC 4258337K->4354810K(4608000K),
0.1374600 secs]
97264.752: [Full GC 4354810K->2599431K(4608000K), 6.7833960 secs]
2013-08-28T13:31:09.884+0800: 97271.610: [GC 2750517K(4608000K), 0.0054320
secs]
2013-08-28T13:31:15.354+0800: 97277.080: [GC 3550474K(4608000K), 0.0871270
secs]
2013-08-28T13:31:31.258+0800: 97292.984: [GC 3877223K(4608000K), 0.1551870
secs]
2013-08-28T13:31:34.396+0800: 97296.123: [GC 3877223K(4608000K), 0.1220380
secs]
2013-08-28T13:31:38.102+0800: 97299.828: [GC 3877225K(4608000K), 0.1545500
secs]
2013-08-28T13:31:40.227+0800: 97303.019: [Full GC
4174941K->2127315K(4608000K), 6.3435150 secs]
2013-08-28T13:31:49.645+0800: 97311.371: [GC 2508466K(4608000K), 0.0355180
secs]
2013-08-28T13:31:57.645+0800: 97319.371: [GC 2967737K(4608000K), 0.0579650
secs]

even more, sometimes a shard is down(one node is recovering, another is
down), that is an absolute disaster...

please help me.   any advice is welcome...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939.html
Sent from the Solr - User mailing list archive at Nabble.com.