Hi Mike,
from the error message it looks like MongoDB is also down from time to
time. Could you check memory consumption on that box and 'dmesg' for
OOM-killer?

On 28 May 2015 at 14:04, Mike Hogan <[email protected]> wrote:

> Hello,
>
> I was using 0.9.7 (I think) for a number of months, then I upgraded to
> 1.0.2 in the last couple of weeks.  Now I am finding that occasionally
> (once a day) the graylog ui becomes unresponsive.  When I check the status
> I get this:
>
> root@graylog:/var/log/graylog/server# graylog-ctl status
> down: elasticsearch: 0s, normally up, want up; run: log: (pid 1204) 911220s
> run: etcd: (pid 4722) 177186s; run: log: (pid 1193) 911221s
> run: graylog-server: (pid 13509) 160101s; run: log: (pid 1191) 911221s
> run: graylog-web: (pid 4734) 177185s; run: log: (pid 1190) 911221s
> run: mongodb: (pid 4792) 177184s; run: log: (pid 1192) 911221s
> run: nginx: (pid 4806) 177184s; run: log: (pid 1208) 911220s
>
> which suggests elasticsearch is down.
>
> When I restart:
>
> root@graylog:/var/log/graylog/server# graylog-ctl restart
> ok: run: elasticsearch: (pid 22945) 0s
> ok: run: etcd: (pid 22955) 0s
> timeout: run: graylog-server: (pid 13509) 160288s, got TERM
> ok: run: graylog-web: (pid 23055) 0s
> ok: run: mongodb: (pid 23091) 0s
> ok: run: nginx: (pid 23096) 0s
>
> So elasticsearch comes up, but graylog-server refuses to.  Issuing the
> restart command a second time gives the same results.  Issuing the stop
> command also times out:
>
> root@graylog:/var/log/graylog/server# graylog-ctl stop
> ok: down: elasticsearch: 0s, normally up
> ok: down: etcd: 0s, normally up
> timeout: run: graylog-server: (pid 13509) 160328s, want down, got TERM
> ok: down: graylog-web: 0s, normally up
> ok: down: mongodb: 0s, normally up
> ok: down: nginx: 1s, normally up
>
> I get back running, I have to kill -9 the graylog server, followed by
> graylog-ctl start.
>
> Today I am not sure what time the service went down, but I had millions of
> these in the lead up:
>
> org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
>
> Followed by a mix of these exceptions:
>
> * org.elasticsearch.node.NodeClosedException: node closed
> [graylog2-server][4aLJKPqeR2CCwR84ZO6I9w][graylog][inet[/10.4.11.143:9350]]{client=true,
> data=false, master=false}
>
> * com.mongodb.MongoException$Network: Read operation to server
> 127.0.0.1:27017 failed on database graylog
>
> * com.mongodb.MongoTimeoutException: Timed out after 10000 ms while
> waiting to connect. Client view of cluster state is {type=Unknown,
> servers=[{address=127.0.0.1:27017, type=Unknown, state=Connecting,
> exception={com.mongodb.MongoException$Network: Exception opening the
> socket}, caused by {java.net.ConnectException: Connection refused}}]
>
> I am running in an EC2 environment, with AMIs created using packer, using
> the scripts at
> https://github.com/Graylog2/graylog2-images/tree/master/packer with some
> local extensions.
>
> Is this related to any known issues?  If not, can you offer help/advice on
> how I should go about getting to the bottom of the issue?
>
> Many thanks,
> Mike.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Developer

Tel.: +49 (0)40 609 452 077
Fax.: +49 (0)40 609 452 078

TORCH GmbH - A Graylog Company
Steckelhörn 11
20457 Hamburg
Germany

https://www.graylog.com <https://www.torch.sh/>

Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
Geschäftsführer: Lennart Koopmann (CEO)

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to