That's not good :/ Cause graylog2 is behaving strangely. It's running for approx. 17 hours now and these warnings appear every few seconds: 2015-02-11 09:40:59,393 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=3015 milliseconds) 2015-02-11 09:41:03,190 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=3024 milliseconds) 2015-02-11 09:41:06,986 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=3181 milliseconds) 2015-02-11 09:41:11,303 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=3048 milliseconds) 2015-02-11 09:41:11,304 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS MarkSweep took longer than 1 second (last duration=159306 milliseconds) 2015-02-11 09:41:15,169 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=2575 milliseconds) 2015-02-11 09:41:19,652 WARN : org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with PS Scavenge took longer than 1 second (last duration=2838 milliseconds)
As seen in the HQ plugin from ES, the Cluster is fine. The Search Query/Fetch could be faster ... Summary *Node Name:* host1.test.local host3.test.local host2.test.local graylog.test.local *IP Address:* 192.168.0.1:9300 192.168.0.3:9300 192.168.0.2:9300 192.168.0.3:9350 *Node ID:* Fbiyz9krQq-KkhxxxI5NQQ N-GgJy4aR1ecMxxxJPIE2Q aWH0zgNSRJuoQZAxxxbUSQ DJdD6KJtSM6uoxxxCGTIoQ *ES Uptime:* 0.69 days 0.69 days 0.69 days 0.69 days File System *Store Size:* 12.4GB 12.4GB 12.4GB 0.0 *# Documents:* 52,278,572 52,278,572 52,278,572 0 *Documents Deleted:* 0% 0% 0% 0% *Merge Size:* 13.1GB 12.8GB 13.1GB 0.0 *Merge Time:* 00:18:35 00:17:08 00:18:15 00:00:00 *Merge Rate:* 12.6 MB/s 13.4 MB/s 12.9 MB/s 0 MB/s *File Descriptors:* 565 561 555 366 Index Activity *Indexing - Index:* 0.71ms 1.06ms 0.71ms 0ms *Indexing - Delete:* 0ms 0ms 0ms 0ms *Search - Query:* 1031.5ms 1076.11ms 965.14ms 0ms *Search - Fetch:* 47.5ms 61ms 101ms 0ms *Get - Total:* 0ms 0ms 0ms 0ms *Get - Exists:* 0ms 0ms 0ms 0ms *Get - Missing:* 0ms 0ms 0ms 0ms *Refresh:* 3.97ms 3.53ms 3.99ms 0ms *Flush:* 31.64ms 54.56ms 33.04ms 0ms Cache Activity *Field Size:* 92.9MB 93.4MB 93.2MB 0.0 *Field Evictions:* 0 0 0 0 *Filter Cache Size:* 1.4KB 1.4KB 144.0B 0.0 *Filter Evictions:* 0 per query 0 per query 0 per query 0 per query *ID Cache Size:* *% ID Cache:* 0% 0% 0% 0% Memory *Total Memory:* 16 gb 16 gb 16 gb 0 gb *Heap Size:* 5.9 gb 5.9 gb 5.9 gb 0.1 gb *Heap % of RAM:* 38.1% 38.1% 38.1% 0% *% Heap Used:* 8% 13.1% 10.7% 80.4% *GC MarkSweep Frequency:* 0 s 0 s 0 s 0 s *GC MarkSweep Duration:* 0ms 0ms 0ms 0ms *GC ParNew Frequency:* 0 s 0 s 0 s 0 s *GC ParNew Duration:* 0ms 0ms 0ms 0ms *G1 GC Young Generation Freq:* 0 s 0 s 0 s 0 s *G1 GC Young Generation Duration:* 0ms 0ms 0ms 0ms *G1 GC Old Generation Freq:* 0 s 0 s 0 s 0 s *G1 GC Old Generation Duration:* 0ms 0ms 0ms 0ms *Swap Space:* 0.0000 mb 0.0000 mb 0.0000 mb undefined mb Network *HTTP Connection Rate:* 0 /second 0 /second 0 /second 0 /second Any ideas where the problems with the GC come from?? On Tuesday, February 10, 2015 at 3:57:25 PM UTC+1, Arie wrote: > > not 100% sure read about it and it looks fine. > We are running with a master node explicitly. > > I now see I was confused by your question, because it seams more graylog > related. > Looking at your config I am not seeing strange things. > > > > > > > On Tuesday, February 10, 2015 at 3:17:49 PM UTC+1, Christoph Fürstaller > wrote: >> >> correct! But the other two could take this role if the master goes down. >> Am I right? So my setup is fine. Or do I misunderstand something? >> >> On Tuesday, February 10, 2015 at 2:51:20 PM UTC+1, Arie wrote: >>> >>> When running, only one server can be master. This server is regulating >>> all the logic of your es cluster, >>> and is the one that graylog is talking to. >>> >>> >>> >>> On Tuesday, February 10, 2015 at 2:04:52 PM UTC+1, Christoph Fürstaller >>> wrote: >>>> >>>> Hi, >>>> >>>> Thanks for the configuration docu. >>>> >>>> Can I really run into split brain? >>>> I have 3 nodes, they are all equal. Everyone of them can be a master >>>> and will store data. With the discovery.zen.minimum_master_nodes: 2 I >>>> can't >>>> get a split brain. Or am I wrong? >>>> Or is this setup not ideal? >>>> >>>> Chris... >>>> >>>> On Tuesday, February 10, 2015 at 1:38:06 PM UTC+1, Arie wrote: >>>>> >>>>> You coud bump into a split brain situation running all ES nodes as >>>>> master. >>>>> >>>>> Check out this to configure your cluster: >>>>> >>>>> >>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_minimum_master_nodes >>>>> >>>>> >>>>> >>>>> On Tuesday, February 10, 2015 at 12:09:33 AM UTC+1, Christoph >>>>> Fürstaller wrote: >>>>>> >>>>>> Thanks for your answer! >>>>>> >>>>>> About the master/data nodes. What happens when the master goes down? >>>>>> Will one of the 'slaves' become a master? I configured all 3 as master >>>>>> for >>>>>> redundancy, so the cluster still survives if only one node is present. >>>>>> Is >>>>>> this assumption wrong? >>>>>> >>>>>> I've increased the ES_HEAP_SIZE to 6G before, with the same results. >>>>>> >>>>>> Chris... >>>>>> >>>>>> Am Montag, 9. Februar 2015 20:30:28 UTC+1 schrieb Arie: >>>>>>> >>>>>>> Hi,, >>>>>>> >>>>>>> Looking @ your config in elasticsearch.yml the follwing comes in to >>>>>>> mind >>>>>>> >>>>>>> One node should be: >>>>>>> node.master: true >>>>>>> node.data: true >>>>>>> >>>>>>> and for the other two nodes: >>>>>>> node.master: false >>>>>>> node.data: false >>>>>>> >>>>>>> elasticseaarch.conf >>>>>>> ES_HEAP_SIZE >>>>>>> >>>>>>> you can take this easy up o 8G (50% of your memory) and check if >>>>>>> this is really >>>>>>> running so. In my case on Centos6 I put this in >>>>>>> /etc/conf.d/elasticseaarch >>>>>>> >>>>>>> Good luck. >>>>>>> >>>>>>> On Friday, February 6, 2015 at 12:58:27 PM UTC+1, Christoph >>>>>>> Fürstaller wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> Yesterday we've updatet our Graylog2/Elasticsearch Cluster. The >>>>>>>> Elastic Search Cluster consists of 3 physical Maschins: DL380 G7, >>>>>>>> E5620, >>>>>>>> 16GB RAM on RHEL 6.6. Each ES Node gets 4GB RAM. On one Host there is >>>>>>>> the >>>>>>>> graylog2 Server/Interface installed. Until yesterday we used >>>>>>>> Elasticsearch >>>>>>>> 0.90.10-1 and graylog2-0.20.3 Yesterday we updatet graylog2 to 0.90.0, >>>>>>>> startet everything, everything was running fine. Then Stopped graylog2 >>>>>>>> and >>>>>>>> the ElaticSearch Cluster, upgraded ES to 1.3.4 and graylog to 0.92.4. >>>>>>>> The >>>>>>>> Upgrade from ES was successfully, after that, startet graylog2, which >>>>>>>> connected to the cluster and showed everything. >>>>>>>> >>>>>>>> In the ES Cluster there are 7 indices a 20mio messages. The last 3 >>>>>>>> indices are opened, the other closed. Graylog2 sees approx 50mio >>>>>>>> messages. >>>>>>>> New messages arrive with approx 5msg/sec >>>>>>>> >>>>>>>> In the logs from graylog2-server there are messages like this, >>>>>>>> every couple of minutes: >>>>>>>> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC >>>>>>>> run with PS Scavenge took longer than 1 second >>>>>>>> >>>>>>>> It seems graylog is running fine, a bit slow on searches, but fine. >>>>>>>> >>>>>>>> Attached are the config files for graylog2 and elasticsearch. >>>>>>>> >>>>>>>> Can someone give us a hint where this warnings come from? What we >>>>>>>> can tweak? Would be very helpful! >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Chris... >>>>>>>> >>>>>>> -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
