[graylog2] Graylog 1.0.2: Graylog server or elasticsearch goes down and then refuses to restart

Mike Hogan Thu, 28 May 2015 05:55:12 -0700

Hello,

I was using 0.9.7 (I think) for a number of months, then I upgraded to 
1.0.2 in the last couple of weeks.  Now I am finding that occasionally 
(once a day) the graylog ui becomes unresponsive.  When I check the status 
I get this:


root@graylog:/var/log/graylog/server# graylog-ctl status
down: elasticsearch: 0s, normally up, want up; run: log: (pid 1204) 911220s
run: etcd: (pid 4722) 177186s; run: log: (pid 1193) 911221s
run: graylog-server: (pid 13509) 160101s; run: log: (pid 1191) 911221s
run: graylog-web: (pid 4734) 177185s; run: log: (pid 1190) 911221s
run: mongodb: (pid 4792) 177184s; run: log: (pid 1192) 911221s
run: nginx: (pid 4806) 177184s; run: log: (pid 1208) 911220s

which suggests elasticsearch is down.

When I restart:

root@graylog:/var/log/graylog/server# graylog-ctl restart
ok: run: elasticsearch: (pid 22945) 0s
ok: run: etcd: (pid 22955) 0s
timeout: run: graylog-server: (pid 13509) 160288s, got TERM
ok: run: graylog-web: (pid 23055) 0s
ok: run: mongodb: (pid 23091) 0s
ok: run: nginx: (pid 23096) 0s

So elasticsearch comes up, but graylog-server refuses to.  Issuing the 
restart command a second time gives the same results.  Issuing the stop 
command also times out:

root@graylog:/var/log/graylog/server# graylog-ctl stop
ok: down: elasticsearch: 0s, normally up
ok: down: etcd: 0s, normally up
timeout: run: graylog-server: (pid 13509) 160328s, want down, got TERM
ok: down: graylog-web: 0s, normally up
ok: down: mongodb: 0s, normally up
ok: down: nginx: 1s, normally up

I get back running, I have to kill -9 the graylog server, followed by 
graylog-ctl start.

Today I am not sure what time the service went down, but I had millions of 
these in the lead up:

org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]

Followed by a mix of these exceptions:

* org.elasticsearch.node.NodeClosedException: node closed 
[graylog2-server][4aLJKPqeR2CCwR84ZO6I9w][graylog][inet[/10.4.11.143:9350]]{client=true,
 
data=false, master=false}

* com.mongodb.MongoException$Network: Read operation to server 
127.0.0.1:27017 failed on database graylog

* com.mongodb.MongoTimeoutException: Timed out after 10000 ms while waiting 
to connect. Client view of cluster state is {type=Unknown, 
servers=[{address=127.0.0.1:27017, type=Unknown, state=Connecting, 
exception={com.mongodb.MongoException$Network: Exception opening the 
socket}, caused by {java.net.ConnectException: Connection refused}}]

I am running in an EC2 environment, with AMIs created using packer, using 
the scripts 
at https://github.com/Graylog2/graylog2-images/tree/master/packer with some 
local extensions.

Is this related to any known issues?  If not, can you offer help/advice on 
how I should go about getting to the bottom of the issue?

Many thanks,
Mike.


-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Graylog 1.0.2: Graylog server or elasticsearch goes down and then refuses to restart

Reply via email to