We've upgraded our production system (AWS images) from 1.3.x to 2.0.2 On the primary server the Graylog Server is fully operational Whereas on the secondary server, the process is running (or it seems), but it's not writing anything to the logs and it does not appear in the UI as a node.
On the trouble server sudo graylog-ctl status shows run: elasticsearch: (pid 1036) 480s; run: log: (pid 1032) 480s run: etcd: (pid 1033) 480s; run: log: (pid 1028) 480s run: *graylog-server: (pid 1029)* 480s; run: log: (pid 1024) 480s run: nginx: (pid 1025) 480s; run: log: (pid 1022) 480s As seen graylog-server is running with pid 1029 But if we check the processes with pid 1029 ps -elf | grep 1029 shows 0 S root 1029 1018 0 80 0 - 1110 - 21:26 ? 00:00:00 /bin/sh ./run 0 S root 1039 1029 0 80 0 - 2154 - 21:26 ? 00:00:00 timeout 600 bash -c until curl -s http://127.0.0.1:27017; do sleep 1; done 0 S ubuntu 2638 2524 0 80 0 - 2616 pipe_w 21:35 pts/0 00:00:00 grep --color=auto 1029 Which clearly is *not *the graylog-server process If we check the same thing on the primary server where everything is working fine, sudo graylog-ctl status shows run: elasticsearch: (pid 12071) 1318s; run: log: (pid 1037) 333246s run: etcd: (pid 12090) 1317s; run: log: (pid 1035) 333246s run: *graylog-server: (pid 12125)* 1312s; run: log: (pid 1038) 333246s run: mongodb: (pid 12132) 1311s; run: log: (pid 1036) 333246s run: nginx: (pid 12134) 1311s; run: log: (pid 1039) 333246s ps -elf | grep 12125 shows 4 S graylog 12125 1031 28 80 0 - 1169685 - 21:13 ? 00:06:14 /opt/graylog/embedded/jre/bin/java -Xms1g -Xmx1500m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar -Dlog4j.configurationFile=file:///opt/graylog/conf/log4j2.xml -Djava.library.path=/opt/graylog/server/lib/sigar/ -Dgraylog2.installation_source=unknown /opt/graylog/server/graylog.jar server -f /opt/graylog/conf/graylog.conf 0 S ubuntu 17847 1419 0 80 0 - 2615 pipe_w 21:35 pts/1 00:00:00 grep --color=auto 12125 Clearly the graylog-server is running. So my questions are: - Why graylog-ctl thinks that graylog-server is running - Why graylog-server is not running? - How can we narrow down the root issue? with graylog-server not running, there the log files are not updated, hence no clue what is going on. - Are there higher level logs for the graylog-ctl that would inform us what it is going wrong when it is trying to start the graylog-server PS: We noticed that after a long while, the graylog server eventually shows up as a node on the UI, and the logs start filling Looking for errors in the logs, we only noticed the following warning 2016-06-17_17:04:56.90879 2016-06-17 17:04:56,908 WARN : org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}> We're not even certain it had any relevance to the problem of graylog-server not starting immediately. Thanks guidance on how to narrow this down is greatly appreciated. Thanks -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to graylog2+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/530ffc00-1742-4eea-994a-d5e95c165e88%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.