I found a solution for my problem with the GC. I was using an old Java Version 1.7.0_21 with the newer version 1.7.0_71 the GC warnings are gone and everything is running fine!
On Wednesday, February 11, 2015 at 9:54:59 AM UTC+1, Christoph Fürstaller wrote: > > That's not good :/ > Cause graylog2 is behaving strangely. It's running for approx. 17 hours > now and these warnings appear every few seconds: > 2015-02-11 09:40:59,393 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=3015 milliseconds) > 2015-02-11 09:41:03,190 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=3024 milliseconds) > 2015-02-11 09:41:06,986 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=3181 milliseconds) > 2015-02-11 09:41:11,303 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=3048 milliseconds) > 2015-02-11 09:41:11,304 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS MarkSweep took longer than 1 second (last duration=159306 milliseconds) > 2015-02-11 09:41:15,169 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=2575 milliseconds) > 2015-02-11 09:41:19,652 WARN : > org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with > PS Scavenge took longer than 1 second (last duration=2838 milliseconds) > > As seen in the HQ plugin from ES, the Cluster is fine. The Search > Query/Fetch could be faster ... > > Summary *Node Name:* host1.test.local host3.test.local > host2.test.local graylog.test.local *IP Address:* 192.168.0.1:9300 > 192.168.0.3:9300 192.168.0.2:9300 192.168.0.3:9350 *Node ID:* > Fbiyz9krQq-KkhxxxI5NQQ N-GgJy4aR1ecMxxxJPIE2Q aWH0zgNSRJuoQZAxxxbUSQ > DJdD6KJtSM6uoxxxCGTIoQ *ES Uptime:* 0.69 days 0.69 days 0.69 days > 0.69 days File System *Store Size:* 12.4GB 12.4GB 12.4GB 0.0 *# > Documents:* 52,278,572 52,278,572 52,278,572 0 *Documents Deleted:* > 0% 0% 0% 0% *Merge Size:* 13.1GB 12.8GB 13.1GB 0.0 *Merge > Time:* 00:18:35 00:17:08 00:18:15 00:00:00 *Merge Rate:* 12.6 MB/s > 13.4 MB/s 12.9 MB/s 0 MB/s *File Descriptors:* 565 561 555 366 > Index Activity *Indexing - Index:* 0.71ms 1.06ms 0.71ms 0ms *Indexing > - Delete:* 0ms 0ms 0ms 0ms *Search - Query:* 1031.5ms 1076.11ms > 965.14ms 0ms *Search - Fetch:* 47.5ms 61ms 101ms 0ms *Get - > Total:* 0ms 0ms 0ms 0ms *Get - Exists:* 0ms 0ms 0ms 0ms *Get > - Missing:* 0ms 0ms 0ms 0ms *Refresh:* 3.97ms 3.53ms 3.99ms 0ms > *Flush:* 31.64ms 54.56ms 33.04ms 0ms Cache Activity *Field Size:* > 92.9MB 93.4MB 93.2MB 0.0 *Field Evictions:* 0 0 0 0 *Filter > Cache Size:* 1.4KB 1.4KB 144.0B 0.0 *Filter Evictions:* 0 per > query 0 per query 0 per query 0 per query *ID Cache Size:* > > > > *% ID Cache:* 0% 0% 0% 0% Memory *Total Memory:* 16 gb 16 gb > 16 gb 0 gb *Heap Size:* 5.9 gb 5.9 gb 5.9 gb 0.1 gb *Heap % of > RAM:* 38.1% 38.1% 38.1% 0% *% Heap Used:* 8% 13.1% 10.7% 80.4% > *GC > MarkSweep Frequency:* 0 s 0 s 0 s 0 s *GC MarkSweep Duration:* 0ms > 0ms 0ms 0ms *GC ParNew Frequency:* 0 s 0 s 0 s 0 s *GC ParNew > Duration:* 0ms 0ms 0ms 0ms *G1 GC Young Generation Freq:* 0 s 0 s > 0 s 0 s *G1 GC Young Generation Duration:* 0ms 0ms 0ms 0ms *G1 > GC Old Generation Freq:* 0 s 0 s 0 s 0 s *G1 GC Old Generation > Duration:* 0ms 0ms 0ms 0ms *Swap Space:* 0.0000 mb 0.0000 mb > 0.0000 mb undefined mb Network *HTTP Connection Rate:* 0 /second 0 > /second 0 /second 0 /second > Any ideas where the problems with the GC come from?? > > On Tuesday, February 10, 2015 at 3:57:25 PM UTC+1, Arie wrote: >> >> not 100% sure read about it and it looks fine. >> We are running with a master node explicitly. >> >> I now see I was confused by your question, because it seams more graylog >> related. >> Looking at your config I am not seeing strange things. >> >> >> >> >> >> >> On Tuesday, February 10, 2015 at 3:17:49 PM UTC+1, Christoph Fürstaller >> wrote: >>> >>> correct! But the other two could take this role if the master goes down. >>> Am I right? So my setup is fine. Or do I misunderstand something? >>> >>> On Tuesday, February 10, 2015 at 2:51:20 PM UTC+1, Arie wrote: >>>> >>>> When running, only one server can be master. This server is regulating >>>> all the logic of your es cluster, >>>> and is the one that graylog is talking to. >>>> >>>> >>>> >>>> On Tuesday, February 10, 2015 at 2:04:52 PM UTC+1, Christoph Fürstaller >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Thanks for the configuration docu. >>>>> >>>>> Can I really run into split brain? >>>>> I have 3 nodes, they are all equal. Everyone of them can be a master >>>>> and will store data. With the discovery.zen.minimum_master_nodes: 2 I >>>>> can't >>>>> get a split brain. Or am I wrong? >>>>> Or is this setup not ideal? >>>>> >>>>> Chris... >>>>> >>>>> On Tuesday, February 10, 2015 at 1:38:06 PM UTC+1, Arie wrote: >>>>>> >>>>>> You coud bump into a split brain situation running all ES nodes as >>>>>> master. >>>>>> >>>>>> Check out this to configure your cluster: >>>>>> >>>>>> >>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_minimum_master_nodes >>>>>> >>>>>> >>>>>> >>>>>> On Tuesday, February 10, 2015 at 12:09:33 AM UTC+1, Christoph >>>>>> Fürstaller wrote: >>>>>>> >>>>>>> Thanks for your answer! >>>>>>> >>>>>>> About the master/data nodes. What happens when the master goes down? >>>>>>> Will one of the 'slaves' become a master? I configured all 3 as master >>>>>>> for >>>>>>> redundancy, so the cluster still survives if only one node is present. >>>>>>> Is >>>>>>> this assumption wrong? >>>>>>> >>>>>>> I've increased the ES_HEAP_SIZE to 6G before, with the same results. >>>>>>> >>>>>>> Chris... >>>>>>> >>>>>>> Am Montag, 9. Februar 2015 20:30:28 UTC+1 schrieb Arie: >>>>>>>> >>>>>>>> Hi,, >>>>>>>> >>>>>>>> Looking @ your config in elasticsearch.yml the follwing comes in to >>>>>>>> mind >>>>>>>> >>>>>>>> One node should be: >>>>>>>> node.master: true >>>>>>>> node.data: true >>>>>>>> >>>>>>>> and for the other two nodes: >>>>>>>> node.master: false >>>>>>>> node.data: false >>>>>>>> >>>>>>>> elasticseaarch.conf >>>>>>>> ES_HEAP_SIZE >>>>>>>> >>>>>>>> you can take this easy up o 8G (50% of your memory) and check if >>>>>>>> this is really >>>>>>>> running so. In my case on Centos6 I put this in >>>>>>>> /etc/conf.d/elasticseaarch >>>>>>>> >>>>>>>> Good luck. >>>>>>>> >>>>>>>> On Friday, February 6, 2015 at 12:58:27 PM UTC+1, Christoph >>>>>>>> Fürstaller wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> Yesterday we've updatet our Graylog2/Elasticsearch Cluster. The >>>>>>>>> Elastic Search Cluster consists of 3 physical Maschins: DL380 G7, >>>>>>>>> E5620, >>>>>>>>> 16GB RAM on RHEL 6.6. Each ES Node gets 4GB RAM. On one Host there is >>>>>>>>> the >>>>>>>>> graylog2 Server/Interface installed. Until yesterday we used >>>>>>>>> Elasticsearch >>>>>>>>> 0.90.10-1 and graylog2-0.20.3 Yesterday we updatet graylog2 to >>>>>>>>> 0.90.0, >>>>>>>>> startet everything, everything was running fine. Then Stopped >>>>>>>>> graylog2 and >>>>>>>>> the ElaticSearch Cluster, upgraded ES to 1.3.4 and graylog to 0.92.4. >>>>>>>>> The >>>>>>>>> Upgrade from ES was successfully, after that, startet graylog2, which >>>>>>>>> connected to the cluster and showed everything. >>>>>>>>> >>>>>>>>> In the ES Cluster there are 7 indices a 20mio messages. The last 3 >>>>>>>>> indices are opened, the other closed. Graylog2 sees approx 50mio >>>>>>>>> messages. >>>>>>>>> New messages arrive with approx 5msg/sec >>>>>>>>> >>>>>>>>> In the logs from graylog2-server there are messages like this, >>>>>>>>> every couple of minutes: >>>>>>>>> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC >>>>>>>>> run with PS Scavenge took longer than 1 second >>>>>>>>> >>>>>>>>> It seems graylog is running fine, a bit slow on searches, but fine. >>>>>>>>> >>>>>>>>> Attached are the config files for graylog2 and elasticsearch. >>>>>>>>> >>>>>>>>> Can someone give us a hint where this warnings come from? What we >>>>>>>>> can tweak? Would be very helpful! >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> Chris... >>>>>>>>> >>>>>>>> -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
