[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994365#comment-12994365
 ] 

Patrick Hunt commented on ZOOKEEPER-990:
----------------------------------------

Sounds like it might be some sort of leak, given it only starts happening after 
a few hours of this load profile. A few things to look at:

1) what client type were you running? (c/java?). How many sessions and what 
does the environment look like.

2) are the timeouts correlated to a particular server, or any server in the 
cluster? Perhaps the server that's overloaded in ZOOKEEPER-989 ?

3) review the troubleshooting guide
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
In particular rule out gc/swap (server and client side). Have you tuned the 
server gc at all? (jvm default is non-incremental)

4) use the 4 letter words to identify what the latency (and total number of 
znodes - is this increasing?) is on the servers : 
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands
zktop is very useful for this https://github.com/phunt/zktop

if the max latency is under your timeout the clients should not be seeing 
timeouts...

5) use visualvm or similar (jconsole?) to monitor one/more of the servers where 
clients are seeing timeouts, do you see anything unusual wrt gc/heap/etc...

6) attach logs from a server/client at the time where you see the issues. that 
will give us more insight.


> random session timeout when there is a large number of sessions
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-990
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-990
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>            Reporter: Xiaowei Jiang
>
> When there is large number of sessions, random session timeout starts after a 
> few hours. It happens even though the load on the server is small (less than 
> 1 out of 8 process busy and plenty of memory). Increase the timeout to 300 
> seconds only delays this but the session timeout eventually happens.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to