Hi Mike, from the error message it looks like MongoDB is also down from time to time. Could you check memory consumption on that box and 'dmesg' for OOM-killer?
On 28 May 2015 at 14:04, Mike Hogan <[email protected]> wrote: > Hello, > > I was using 0.9.7 (I think) for a number of months, then I upgraded to > 1.0.2 in the last couple of weeks. Now I am finding that occasionally > (once a day) the graylog ui becomes unresponsive. When I check the status > I get this: > > root@graylog:/var/log/graylog/server# graylog-ctl status > down: elasticsearch: 0s, normally up, want up; run: log: (pid 1204) 911220s > run: etcd: (pid 4722) 177186s; run: log: (pid 1193) 911221s > run: graylog-server: (pid 13509) 160101s; run: log: (pid 1191) 911221s > run: graylog-web: (pid 4734) 177185s; run: log: (pid 1190) 911221s > run: mongodb: (pid 4792) 177184s; run: log: (pid 1192) 911221s > run: nginx: (pid 4806) 177184s; run: log: (pid 1208) 911220s > > which suggests elasticsearch is down. > > When I restart: > > root@graylog:/var/log/graylog/server# graylog-ctl restart > ok: run: elasticsearch: (pid 22945) 0s > ok: run: etcd: (pid 22955) 0s > timeout: run: graylog-server: (pid 13509) 160288s, got TERM > ok: run: graylog-web: (pid 23055) 0s > ok: run: mongodb: (pid 23091) 0s > ok: run: nginx: (pid 23096) 0s > > So elasticsearch comes up, but graylog-server refuses to. Issuing the > restart command a second time gives the same results. Issuing the stop > command also times out: > > root@graylog:/var/log/graylog/server# graylog-ctl stop > ok: down: elasticsearch: 0s, normally up > ok: down: etcd: 0s, normally up > timeout: run: graylog-server: (pid 13509) 160328s, want down, got TERM > ok: down: graylog-web: 0s, normally up > ok: down: mongodb: 0s, normally up > ok: down: nginx: 1s, normally up > > I get back running, I have to kill -9 the graylog server, followed by > graylog-ctl start. > > Today I am not sure what time the service went down, but I had millions of > these in the lead up: > > org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s] > > Followed by a mix of these exceptions: > > * org.elasticsearch.node.NodeClosedException: node closed > [graylog2-server][4aLJKPqeR2CCwR84ZO6I9w][graylog][inet[/10.4.11.143:9350]]{client=true, > data=false, master=false} > > * com.mongodb.MongoException$Network: Read operation to server > 127.0.0.1:27017 failed on database graylog > > * com.mongodb.MongoTimeoutException: Timed out after 10000 ms while > waiting to connect. Client view of cluster state is {type=Unknown, > servers=[{address=127.0.0.1:27017, type=Unknown, state=Connecting, > exception={com.mongodb.MongoException$Network: Exception opening the > socket}, caused by {java.net.ConnectException: Connection refused}}] > > I am running in an EC2 environment, with AMIs created using packer, using > the scripts at > https://github.com/Graylog2/graylog2-images/tree/master/packer with some > local extensions. > > Is this related to any known issues? If not, can you offer help/advice on > how I should go about getting to the bottom of the issue? > > Many thanks, > Mike. > > > -- > You received this message because you are subscribed to the Google Groups > "graylog2" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Developer Tel.: +49 (0)40 609 452 077 Fax.: +49 (0)40 609 452 078 TORCH GmbH - A Graylog Company Steckelhörn 11 20457 Hamburg Germany https://www.graylog.com <https://www.torch.sh/> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175 Geschäftsführer: Lennart Koopmann (CEO) -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
