Dev,

This message is just to document what happened today in the SGRC dev 
environment.  But if you have any insight on what caused this to happen, please 
share.

TLDR: Zookeeper log fills disk, Logstash is spamming ZK with requests, not sure 
what caused it but have reconfigured ZK logging to rotate files and not fill up 
disk.

So today in our dev environment the Zookeeper server’s log file filled up the 
disk. The size was about 190GB.  It wasn’t being rotated so it has possibly 
been growing for a while. On the other hand I saw in the log file that there 
were several messages a second, that looked like this:

2017-05-26 11:42:35,070 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46462 (no session established for client)
2017-05-26 11:42:35,070 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46464
2017-05-26 11:42:35,071 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46464 (no session established for client)
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46466
2017-05-26 11:42:35,071 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,071 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46466 (no session established for client)
2017-05-26 11:42:35,072 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:46468
2017-05-26 11:42:35,072 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
causing close of session 0x0 due to java.io.IOException: Unreasonable length = 
1684371039
2017-05-26 11:42:35,072 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket 
connection for client /127.0.0.1:46468 (no session established for client)
…


I shut down api-orch and gfac and Kafka (which is just pushing log messages to 
Logstash).  I then deleted the ./data directory in the Zookeeper installation 
and restarted it. Still getting those error messages above.

Eventually I found that Logstash was apparently trying to make Zookeeper 
connections. So I shut it down as well.  Once I shut down Logstash the error 
messages stop in the Zookeeper log.

It’s hard to say whether the problem was
1. Logstash inappropriately sending too large messages to Zookeeper
2. Or, Zookeeper’s log file fills up disk space causing Zookeeper’s database to 
become corrupted. Once the disk fills up, all sorts of weird behavior can start 
to manifest.

I’ve reconfigured the Zookeeper logging to log to a rotated file, that rotates 
at 10MB and keeps a max of 10 rotated log files.  That should prevent running 
out of disk space. Used this [1] as a resource.  I created issue AIRAVATA-2411 
[2] to incorporate this into our Ansible scripts.


Thanks,

Marcus


[1] 
https://community.hortonworks.com/content/supportkb/49091/zookeeperout-file-keeps-growing-until-restarted.html
[2] https://issues.apache.org/jira/browse/AIRAVATA-2411

Reply via email to