[graylog2] Re: Last GC run with PS Scavenge took longer than 1 second

Christoph Fürstaller Tue, 17 Feb 2015 03:08:59 -0800

I found a solution for my problem with the GC. I was using an old Java 
Version 1.7.0_21 with the newer version 1.7.0_71 the GC warnings are gone 
and everything is running fine!



On Wednesday, February 11, 2015 at 9:54:59 AM UTC+1, Christoph Fürstaller 
wrote:
>
> That's not good :/ 
> Cause graylog2 is behaving strangely. It's running for approx. 17 hours 
> now and these warnings appear every few seconds:
> 2015-02-11 09:40:59,393 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=3015 milliseconds)
> 2015-02-11 09:41:03,190 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=3024 milliseconds)
> 2015-02-11 09:41:06,986 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=3181 milliseconds)
> 2015-02-11 09:41:11,303 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=3048 milliseconds)
> 2015-02-11 09:41:11,304 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS MarkSweep took longer than 1 second (last duration=159306 milliseconds)
> 2015-02-11 09:41:15,169 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=2575 milliseconds)
> 2015-02-11 09:41:19,652 WARN : 
> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
> PS Scavenge took longer than 1 second (last duration=2838 milliseconds)
>
> As seen in the HQ plugin from ES, the Cluster is fine. The Search 
> Query/Fetch could be faster ...
>
> Summary  *Node Name:*  host1.test.local  host3.test.local  
> host2.test.local  graylog.test.local   *IP Address:*  192.168.0.1:9300  
> 192.168.0.3:9300  192.168.0.2:9300  192.168.0.3:9350   *Node ID:*  
> Fbiyz9krQq-KkhxxxI5NQQ  N-GgJy4aR1ecMxxxJPIE2Q  aWH0zgNSRJuoQZAxxxbUSQ  
> DJdD6KJtSM6uoxxxCGTIoQ   *ES Uptime:*  0.69 days  0.69 days  0.69 days  
> 0.69 days   File System  *Store Size:*  12.4GB  12.4GB  12.4GB  0.0   *# 
> Documents:*  52,278,572  52,278,572  52,278,572  0   *Documents Deleted:*  
> 0%  0%  0%  0%   *Merge Size:*  13.1GB  12.8GB  13.1GB  0.0   *Merge 
> Time:*  00:18:35  00:17:08  00:18:15  00:00:00   *Merge Rate:*  12.6 MB/s  
> 13.4 MB/s  12.9 MB/s  0 MB/s   *File Descriptors:*  565  561  555  366   
> Index Activity  *Indexing - Index:*  0.71ms  1.06ms  0.71ms  0ms   *Indexing 
> - Delete:*  0ms  0ms  0ms  0ms   *Search - Query:*  1031.5ms  1076.11ms  
> 965.14ms  0ms   *Search - Fetch:*  47.5ms  61ms  101ms  0ms   *Get - 
> Total:*  0ms  0ms  0ms  0ms   *Get - Exists:*  0ms  0ms  0ms  0ms   *Get 
> - Missing:*  0ms  0ms  0ms  0ms   *Refresh:*  3.97ms  3.53ms  3.99ms  0ms   
> *Flush:*  31.64ms  54.56ms  33.04ms  0ms   Cache Activity  *Field Size:*  
> 92.9MB  93.4MB  93.2MB  0.0   *Field Evictions:*  0  0  0  0   *Filter 
> Cache Size:*  1.4KB  1.4KB  144.0B  0.0   *Filter Evictions:*  0 per 
> query  0 per query  0 per query  0 per query   *ID Cache Size:*  
>  
>  
>  
>   *% ID Cache:*  0%  0%  0%  0%   Memory  *Total Memory:*  16 gb  16 gb  
> 16 gb  0 gb   *Heap Size:*  5.9 gb  5.9 gb  5.9 gb  0.1 gb   *Heap % of 
> RAM:*  38.1%  38.1%  38.1%  0%   *% Heap Used:*  8%  13.1%  10.7%  80.4%   
> *GC 
> MarkSweep Frequency:*  0 s  0 s  0 s  0 s   *GC MarkSweep Duration:*  0ms  
> 0ms  0ms  0ms   *GC ParNew Frequency:*  0 s  0 s  0 s  0 s   *GC ParNew 
> Duration:*  0ms  0ms  0ms  0ms   *G1 GC Young Generation Freq:*  0 s  0 s  
> 0 s  0 s   *G1 GC Young Generation Duration:*  0ms  0ms  0ms  0ms   *G1 
> GC Old Generation Freq:*  0 s  0 s  0 s  0 s   *G1 GC Old Generation 
> Duration:*  0ms  0ms  0ms  0ms   *Swap Space:*  0.0000 mb  0.0000 mb  
> 0.0000 mb  undefined mb   Network  *HTTP Connection Rate:*  0 /second  0 
> /second  0 /second  0 /second 
> Any ideas where the problems with the GC come from??
>
> On Tuesday, February 10, 2015 at 3:57:25 PM UTC+1, Arie wrote:
>>
>> not 100% sure read about it and it looks fine.
>> We are running with a master node explicitly.
>>
>> I now see I was confused by your question, because it seams more graylog 
>> related.
>> Looking at your config I am not seeing strange things.
>>
>>
>>
>>
>>
>>
>> On Tuesday, February 10, 2015 at 3:17:49 PM UTC+1, Christoph Fürstaller 
>> wrote:
>>>
>>> correct! But the other two could take this role if the master goes down. 
>>> Am I right? So my setup is fine. Or do I misunderstand something?
>>>
>>> On Tuesday, February 10, 2015 at 2:51:20 PM UTC+1, Arie wrote:
>>>>
>>>> When running, only one server can be master. This server is regulating 
>>>> all the logic of your es cluster,
>>>> and is the one that graylog is talking to.
>>>>
>>>>
>>>>
>>>> On Tuesday, February 10, 2015 at 2:04:52 PM UTC+1, Christoph Fürstaller 
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for the configuration docu.
>>>>>
>>>>> Can I really run into split brain?
>>>>> I have 3 nodes, they are all equal. Everyone of them can be a master 
>>>>> and will store data. With the discovery.zen.minimum_master_nodes: 2 I 
>>>>> can't 
>>>>> get a split brain. Or am I wrong?
>>>>> Or is this setup not ideal?
>>>>>
>>>>> Chris...
>>>>>
>>>>> On Tuesday, February 10, 2015 at 1:38:06 PM UTC+1, Arie wrote:
>>>>>>
>>>>>> You coud bump into a split brain situation running all ES nodes as 
>>>>>> master.
>>>>>>
>>>>>> Check out this to configure your cluster:
>>>>>>
>>>>>>
>>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_minimum_master_nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tuesday, February 10, 2015 at 12:09:33 AM UTC+1, Christoph 
>>>>>> Fürstaller wrote:
>>>>>>>
>>>>>>> Thanks for your answer!
>>>>>>>
>>>>>>> About the master/data nodes. What happens when the master goes down? 
>>>>>>> Will one of the 'slaves' become a master? I configured all 3 as master 
>>>>>>> for 
>>>>>>> redundancy, so the cluster still survives if only one node is present. 
>>>>>>> Is 
>>>>>>> this assumption wrong?
>>>>>>>
>>>>>>> I've increased the ES_HEAP_SIZE to 6G before, with the same results. 
>>>>>>>
>>>>>>> Chris...
>>>>>>>
>>>>>>> Am Montag, 9. Februar 2015 20:30:28 UTC+1 schrieb Arie:
>>>>>>>>
>>>>>>>> Hi,,
>>>>>>>>
>>>>>>>> Looking @ your config in elasticsearch.yml the follwing comes in to 
>>>>>>>> mind
>>>>>>>>
>>>>>>>> One node should be:
>>>>>>>> node.master: true
>>>>>>>> node.data: true
>>>>>>>>  
>>>>>>>> and for the other two nodes:
>>>>>>>> node.master: false
>>>>>>>> node.data: false
>>>>>>>>
>>>>>>>> elasticseaarch.conf
>>>>>>>> ES_HEAP_SIZE
>>>>>>>>
>>>>>>>> you can take this easy up o 8G (50% of your memory) and check if 
>>>>>>>> this is really
>>>>>>>> running so. In my case on Centos6 I put this in 
>>>>>>>> /etc/conf.d/elasticseaarch
>>>>>>>>
>>>>>>>> Good luck.
>>>>>>>>
>>>>>>>> On Friday, February 6, 2015 at 12:58:27 PM UTC+1, Christoph 
>>>>>>>> Fürstaller wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> Yesterday we've updatet our Graylog2/Elasticsearch Cluster. The 
>>>>>>>>> Elastic Search Cluster consists of 3 physical Maschins: DL380 G7, 
>>>>>>>>> E5620, 
>>>>>>>>> 16GB RAM on RHEL 6.6. Each ES Node gets 4GB RAM. On one Host there is 
>>>>>>>>> the 
>>>>>>>>> graylog2 Server/Interface installed. Until yesterday we used 
>>>>>>>>> Elasticsearch 
>>>>>>>>> 0.90.10-1 and graylog2-0.20.3 Yesterday we updatet graylog2 to 
>>>>>>>>> 0.90.0, 
>>>>>>>>> startet everything, everything was running fine. Then Stopped 
>>>>>>>>> graylog2 and 
>>>>>>>>> the ElaticSearch Cluster, upgraded ES to 1.3.4 and graylog to 0.92.4. 
>>>>>>>>> The 
>>>>>>>>> Upgrade from ES was successfully, after that, startet graylog2, which 
>>>>>>>>> connected to the cluster and showed everything.
>>>>>>>>>
>>>>>>>>> In the ES Cluster there are 7 indices a 20mio messages. The last 3 
>>>>>>>>> indices are opened, the other closed. Graylog2 sees approx 50mio 
>>>>>>>>> messages. 
>>>>>>>>> New messages arrive with approx 5msg/sec
>>>>>>>>>
>>>>>>>>> In the logs from graylog2-server there are messages like this, 
>>>>>>>>> every couple of minutes:
>>>>>>>>> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC 
>>>>>>>>> run with PS Scavenge took longer than 1 second
>>>>>>>>>
>>>>>>>>> It seems graylog is running fine, a bit slow on searches, but fine.
>>>>>>>>>
>>>>>>>>> Attached are the config files for graylog2 and elasticsearch.
>>>>>>>>>
>>>>>>>>> Can someone give us a hint where this warnings come from? What we 
>>>>>>>>> can tweak? Would be very helpful!
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Chris...
>>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Last GC run with PS Scavenge took longer than 1 second

Reply via email to