Thanks Jan, Comments inline
On Sunday, June 19, 2016 at 7:55:05 AM UTC-4, Jan Doberstein wrote: > > Hej, > > what happens if you reboot the Server? What happens if you restart the > Service? > Same behavior, all services start but graylog server behaves exactly as described above for both reboot and service restart. > > What happens if you kill the curl and try to restart graylag-server? > Aha, Thanks for pointing that out Graylog server starts The entire steps below. It looks like Graylog-Server is trying to a local mongo db for 10 minutes before timing out. Why is that, could that be a bug? This is the setting in graylog.conf # MongoDB Configuration mongodb_uri = mongodb://10.20.1.229:27017/graylog Why is it trying localhost? This instance of Graylog-Server is a slave / secondary server that connects to the master's mongo db. If this is a bug in Graylog, kindly re-open this <https://github.com/Graylog2/graylog2-server/issues/2370> ticket. Otherwise please let me know what I should do to avoid this 10 minute test to localhost. Thanks again. Full steps ubuntu@graylog-server2:~$ *sudo graylog-ctl stop* ok: down: elasticsearch: 0s, normally up ok: down: etcd: 0s, normally up ok: down: graylog-server: 0s, normally up ok: down: nginx: 1s, normally up ubuntu@graylog-server2:~$ *sudo graylog-ctl status* down: elasticsearch: 13s, normally up; run: log: (pid 1023) 316859s down: etcd: 13s, normally up; run: log: (pid 1013) 316859s down: graylog-server: 8s, normally up; run: log: (pid 1010) 316859s down: nginx: 8s, normally up; run: log: (pid 1015) 316859s ubuntu@graylog-server2:~$ *sudo graylog-ctl start* ok: run: elasticsearch: (pid 12883) 0s ok: run: etcd: (pid 12907) 0s ok: run: graylog-server: (pid *12919*) 1s ok: run: nginx: (pid 12925) 0s ubuntu@graylog-server2:~$ *ps -elf | grep 12919* 0 S root 12919 1004 0 80 0 - 1110 - 14:12 ? 00:00:00 / bin/sh ./run 0 S root 12920 12919 0 80 0 - 2154 - 14:12 ? 00:00:00 timeout *600 *bash -c until curl -s *http://127.0.0.1:27017*; do sleep 1; done 0 S ubuntu 12963 1582 0 80 0 - 2615 pipe_w 14:12 pts/0 00:00:00 grep --color=auto 12919 ubuntu@graylog-server2:~$ *kill 12920* -bash: kill: (12920) - Operation not permitted ubuntu@graylog-server2:~$ *sudo !!* sudo kill 12920 ubuntu@graylog-server2:~$ *sudo graylog-ctl status* run: elasticsearch: (pid 12883) 34s; run: log: (pid 1023) 316937s run: etcd: (pid 12907) 34s; run: log: (pid 1013) 316937s run: graylog-server: (pid *12919*) 34s; run: log: (pid 1010) 316937s run: nginx: (pid 12925) 33s; run: log: (pid 1015) 316937s ubuntu@graylog-server2:~$ *ps -elf | grep 12919* 4 S graylog 12919 1004 46 80 0 - 1109405 - 14:12 ? 00:00:18 / opt/graylog/embedded/jre/bin/java -Xms1g -Xmx1500m -XX:NewRatio=1 -server - XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+ CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar -Dlog4j.configurationFile=file:///opt/graylog/conf/log4j2.xml -Djava.library.path=/opt/graylog/server/lib/sigar/ -Dgraylog2.installation_source=unknown /opt/graylog/server/graylog.jar server -f /opt/graylog/conf/graylog.conf 0 S ubuntu 13133 1582 0 80 0 - 2615 pipe_w 14:13 pts/0 00:00:00 grep --color=auto 12919 > > with kind regards > Jan > > -- > | ----------------------------------------------------------------- > | get trusted and secure VPN services http://jalogis.ch/vpnsh > > On 17. Juni 2016 at 19:15:20, 123Dev (hr...@123loadboard.com <javascript:>) > wrote: > > > > > > We've upgraded our production system (AWS images) from 1.3.x to 2.0.2 > > On the primary server the Graylog Server is fully operational > > Whereas on the secondary server, the process is running (or it seems), > but > > it's not writing anything to the logs and it does not appear in the UI > as a > > node. > > > > > > On the trouble server > > sudo graylog-ctl status shows > > > > > > run: elasticsearch: (pid 1036) 480s; run: log: (pid 1032) 480s > > run: etcd: (pid 1033) 480s; run: log: (pid 1028) 480s > > run: *graylog-server: (pid 1029)* 480s; run: log: (pid 1024) 480s > > run: nginx: (pid 1025) 480s; run: log: (pid 1022) 480s > > > > > > > > As seen graylog-server is running with pid 1029 > > > > But if we check the processes with pid 1029 > > > > > > ps -elf | grep 1029 shows > > > > > > 0 S root 1029 1018 0 80 0 - 1110 - 21:26 ? 00:00:00 /bin/sh ./run > > 0 S root 1039 1029 0 80 0 - 2154 - 21:26 ? 00:00:00 timeout 600 bash -c > until curl -s http://127.0.0.1:27017; > > do sleep 1; done > > 0 S ubuntu 2638 2524 0 80 0 - 2616 pipe_w 21:35 pts/0 00:00:00 grep > --color=auto 1029 > > > > > > > > > > Which clearly is *not *the graylog-server process > > > > > > If we check the same thing on the primary server where everything is > > working fine, > > sudo graylog-ctl status shows > > > > > > run: elasticsearch: (pid 12071) 1318s; run: log: (pid 1037) 333246s > > run: etcd: (pid 12090) 1317s; run: log: (pid 1035) 333246s > > run: *graylog-server: (pid 12125)* 1312s; run: log: (pid 1038) 333246s > > run: mongodb: (pid 12132) 1311s; run: log: (pid 1036) 333246s > > run: nginx: (pid 12134) 1311s; run: log: (pid 1039) 333246s > > > > > > > > ps -elf | grep 12125 shows > > > > > > 4 S graylog 12125 1031 28 80 0 - 1169685 - 21:13 ? 00:06:14 > /opt/graylog/embedded/jre/bin/java > > -Xms1g -Xmx1500m -XX:NewRatio=1 -server -XX:+ResizeTLAB > -XX:+UseConcMarkSweepGC > > -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled > -XX:+UseParNewGC > > -XX:-OmitStackTraceInFastThrow -jar > -Dlog4j.configurationFile=file:///opt/graylog/conf/log4j2.xml > > -Djava.library.path=/opt/graylog/server/lib/sigar/ > -Dgraylog2.installation_source=unknown > > /opt/graylog/server/graylog.jar server -f /opt/graylog/conf/graylog.conf > > 0 S ubuntu 17847 1419 0 80 0 - 2615 pipe_w 21:35 pts/1 00:00:00 grep > --color=auto 12125 > > > > > > > > > > > > Clearly the graylog-server is running. > > > > So my questions are: > > > > - Why graylog-ctl thinks that graylog-server is running > > - Why graylog-server is not running? > > - How can we narrow down the root issue? with graylog-server not > > running, there the log files are not updated, hence no clue what is > going > > on. > > - Are there higher level logs for the graylog-ctl that would inform us > > what it is going wrong when it is trying to start the graylog-server > > > > > > PS: We noticed that after a long while, the graylog server eventually > shows > > up as a node on the UI, and the logs start filling > > > > Looking for errors in the logs, we only noticed the following warning > > > > > > 2016-06-17_17:04:56.90879 2016-06-17 17:04:56,908 WARN : > > org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled > > event of type from event bus > > > > > > > > We're not even certain it had any relevance to the problem of > > graylog-server not starting immediately. > > > > > > Thanks guidance on how to narrow this down is greatly appreciated. > > > > Thanks > > > > > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Graylog Users" > > group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to graylog2+u...@googlegroups.com <javascript:>. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/graylog2/530ffc00-1742-4eea-994a-d5e95c165e88%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to graylog2+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/ebf32bfc-d99e-42ca-a43e-2dd9c32b7570%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.