[
https://issues.apache.org/jira/browse/ZOOKEEPER-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994365#comment-12994365
]
Patrick Hunt commented on ZOOKEEPER-990:
----------------------------------------
Sounds like it might be some sort of leak, given it only starts happening after
a few hours of this load profile. A few things to look at:
1) what client type were you running? (c/java?). How many sessions and what
does the environment look like.
2) are the timeouts correlated to a particular server, or any server in the
cluster? Perhaps the server that's overloaded in ZOOKEEPER-989 ?
3) review the troubleshooting guide
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
In particular rule out gc/swap (server and client side). Have you tuned the
server gc at all? (jvm default is non-incremental)
4) use the 4 letter words to identify what the latency (and total number of
znodes - is this increasing?) is on the servers :
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands
zktop is very useful for this https://github.com/phunt/zktop
if the max latency is under your timeout the clients should not be seeing
timeouts...
5) use visualvm or similar (jconsole?) to monitor one/more of the servers where
clients are seeing timeouts, do you see anything unusual wrt gc/heap/etc...
6) attach logs from a server/client at the time where you see the issues. that
will give us more insight.
> random session timeout when there is a large number of sessions
> ---------------------------------------------------------------
>
> Key: ZOOKEEPER-990
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-990
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.2
> Reporter: Xiaowei Jiang
>
> When there is large number of sessions, random session timeout starts after a
> few hours. It happens even though the load on the server is small (less than
> 1 out of 8 process busy and plenty of memory). Increase the timeout to 300
> seconds only delays this but the session timeout eventually happens.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira