It occurs to me that this may be related to the large number of TIME_WAIT 
connections we’ve seen before [1].  Since this error was creating and closing 
several connections per second it ended up generating lots of connections in 
TIME_WAIT.

I just thought I would mention this in case it comes up again. I don’t think we 
looked at Kafka/Logstash last time the TIME_WAIT problem occurred.

[1] https://issues.apache.org/jira/browse/AIRAVATA-2321

On May 26, 2017, at 4:47 PM, Christie, Marcus Aaron 
<[email protected]<mailto:[email protected]>> wrote:

Dev,

This message is just to document what happened today in the SGRC dev 
environment.  But if you have any insight on what caused this to happen, please 
share.

TLDR: Zookeeper log fills disk, Logstash is spamming ZK with requests, not sure 
what caused it but have reconfigured ZK logging to rotate files and not fill up 
disk.

So today in our dev environment the Zookeeper server’s log file filled up the 
disk. The size was about 190GB.  It wasn’t being rotated so it has possibly 
been growing for a while. On the other hand I saw in the log file that there 
were several messages a second, that looked like this:

2017-05-26 11:42:35,070 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46462 (no session established for client)
2017-05-26 11:42:35,070 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46464
2017-05-26 11:42:35,071 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46464 (no session established for client)
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46466
2017-05-26 11:42:35,071 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46466 (no session established for client)
2017-05-26 11:42:35,072 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46468
2017-05-26 11:42:35,072 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,072 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46468 (no session established for client)
…


I shut down api-orch and gfac and Kafka (which is just pushing log messages to 
Logstash).  I then deleted the ./data directory in the Zookeeper installation 
and restarted it. Still getting those error messages above.

Eventually I found that Logstash was apparently trying to make Zookeeper 
connections. So I shut it down as well.  Once I shut down Logstash the error 
messages stop in the Zookeeper log.

It’s hard to say whether the problem was
1. Logstash inappropriately sending too large messages to Zookeeper
2. Or, Zookeeper’s log file fills up disk space causing Zookeeper’s database to 
become corrupted. Once the disk fills up, all sorts of weird behavior can start 
to manifest.

I’ve reconfigured the Zookeeper logging to log to a rotated file, that rotates 
at 10MB and keeps a max of 10 rotated log files.  That should prevent running 
out of disk space. Used this [1] as a resource.  I created issue AIRAVATA-2411 
[2] to incorporate this into our Ansible scripts.


Thanks,

Marcus


[1] 
https://community.hortonworks.com/content/supportkb/49091/zookeeperout-file-keeps-growing-until-restarted.html
[2] https://issues.apache.org/jira/browse/AIRAVATA-2411


Reply via email to