[graylog2] Re: Last GC run with PS Scavenge took longer than 1 second

Christoph Fürstaller Wed, 11 Feb 2015 00:55:21 -0800

That's not good :/ 
Cause graylog2 is behaving strangely. It's running for approx. 17 hours now 
and these warnings appear every few seconds:
2015-02-11 09:40:59,393 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=3015 milliseconds)
2015-02-11 09:41:03,190 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=3024 milliseconds)
2015-02-11 09:41:06,986 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=3181 milliseconds)
2015-02-11 09:41:11,303 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=3048 milliseconds)
2015-02-11 09:41:11,304 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS MarkSweep took longer than 1 second (last duration=159306 milliseconds)
2015-02-11 09:41:15,169 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=2575 milliseconds)
2015-02-11 09:41:19,652 WARN : 
org.graylog2.periodical.GarbageCollectionWarningThread - Last GC run with 
PS Scavenge took longer than 1 second (last duration=2838 milliseconds)


As seen in the HQ plugin from ES, the Cluster is fine. The Search 
Query/Fetch could be faster ...

Summary  *Node Name:*  host1.test.local  host3.test.local  host2.test.local  
graylog.test.local   *IP Address:*  192.168.0.1:9300  192.168.0.3:9300  
192.168.0.2:9300  192.168.0.3:9350   *Node ID:*  Fbiyz9krQq-KkhxxxI5NQQ  
N-GgJy4aR1ecMxxxJPIE2Q  aWH0zgNSRJuoQZAxxxbUSQ  DJdD6KJtSM6uoxxxCGTIoQ   *ES 
Uptime:*  0.69 days  0.69 days  0.69 days  0.69 days   File System  *Store 
Size:*  12.4GB  12.4GB  12.4GB  0.0   *# Documents:*  52,278,572  
52,278,572  52,278,572  0   *Documents Deleted:*  0%  0%  0%  0%   *Merge 
Size:*  13.1GB  12.8GB  13.1GB  0.0   *Merge Time:*  00:18:35  00:17:08  
00:18:15  00:00:00   *Merge Rate:*  12.6 MB/s  13.4 MB/s  12.9 MB/s  0 MB/s   
*File 
Descriptors:*  565  561  555  366   Index Activity  *Indexing - Index:*  
0.71ms  1.06ms  0.71ms  0ms   *Indexing - Delete:*  0ms  0ms  0ms  0ms   
*Search 
- Query:*  1031.5ms  1076.11ms  965.14ms  0ms   *Search - Fetch:*  47.5ms  
61ms  101ms  0ms   *Get - Total:*  0ms  0ms  0ms  0ms   *Get - Exists:*  
0ms  0ms  0ms  0ms   *Get - Missing:*  0ms  0ms  0ms  0ms   *Refresh:*  
3.97ms  3.53ms  3.99ms  0ms   *Flush:*  31.64ms  54.56ms  33.04ms  0ms   
Cache Activity  *Field Size:*  92.9MB  93.4MB  93.2MB  0.0   *Field 
Evictions:*  0  0  0  0   *Filter Cache Size:*  1.4KB  1.4KB  144.0B  0.0   
*Filter 
Evictions:*  0 per query  0 per query  0 per query  0 per query   *ID Cache 
Size:*  
 
 
 
  *% ID Cache:*  0%  0%  0%  0%   Memory  *Total Memory:*  16 gb  16 gb  16 
gb  0 gb   *Heap Size:*  5.9 gb  5.9 gb  5.9 gb  0.1 gb   *Heap % of RAM:*  
38.1%  38.1%  38.1%  0%   *% Heap Used:*  8%  13.1%  10.7%  80.4%   *GC 
MarkSweep Frequency:*  0 s  0 s  0 s  0 s   *GC MarkSweep Duration:*  0ms  
0ms  0ms  0ms   *GC ParNew Frequency:*  0 s  0 s  0 s  0 s   *GC ParNew 
Duration:*  0ms  0ms  0ms  0ms   *G1 GC Young Generation Freq:*  0 s  0 s  
0 s  0 s   *G1 GC Young Generation Duration:*  0ms  0ms  0ms  0ms   *G1 GC 
Old Generation Freq:*  0 s  0 s  0 s  0 s   *G1 GC Old Generation Duration:*  
0ms  0ms  0ms  0ms   *Swap Space:*  0.0000 mb  0.0000 mb  0.0000 mb  
undefined mb   Network  *HTTP Connection Rate:*  0 /second  0 /second  0 
/second  0 /second 
Any ideas where the problems with the GC come from??

On Tuesday, February 10, 2015 at 3:57:25 PM UTC+1, Arie wrote:
>
> not 100% sure read about it and it looks fine.
> We are running with a master node explicitly.
>
> I now see I was confused by your question, because it seams more graylog 
> related.
> Looking at your config I am not seeing strange things.
>
>
>
>
>
>
> On Tuesday, February 10, 2015 at 3:17:49 PM UTC+1, Christoph Fürstaller 
> wrote:
>>
>> correct! But the other two could take this role if the master goes down. 
>> Am I right? So my setup is fine. Or do I misunderstand something?
>>
>> On Tuesday, February 10, 2015 at 2:51:20 PM UTC+1, Arie wrote:
>>>
>>> When running, only one server can be master. This server is regulating 
>>> all the logic of your es cluster,
>>> and is the one that graylog is talking to.
>>>
>>>
>>>
>>> On Tuesday, February 10, 2015 at 2:04:52 PM UTC+1, Christoph Fürstaller 
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Thanks for the configuration docu.
>>>>
>>>> Can I really run into split brain?
>>>> I have 3 nodes, they are all equal. Everyone of them can be a master 
>>>> and will store data. With the discovery.zen.minimum_master_nodes: 2 I 
>>>> can't 
>>>> get a split brain. Or am I wrong?
>>>> Or is this setup not ideal?
>>>>
>>>> Chris...
>>>>
>>>> On Tuesday, February 10, 2015 at 1:38:06 PM UTC+1, Arie wrote:
>>>>>
>>>>> You coud bump into a split brain situation running all ES nodes as 
>>>>> master.
>>>>>
>>>>> Check out this to configure your cluster:
>>>>>
>>>>>
>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_minimum_master_nodes
>>>>>
>>>>>
>>>>>
>>>>> On Tuesday, February 10, 2015 at 12:09:33 AM UTC+1, Christoph 
>>>>> Fürstaller wrote:
>>>>>>
>>>>>> Thanks for your answer!
>>>>>>
>>>>>> About the master/data nodes. What happens when the master goes down? 
>>>>>> Will one of the 'slaves' become a master? I configured all 3 as master 
>>>>>> for 
>>>>>> redundancy, so the cluster still survives if only one node is present. 
>>>>>> Is 
>>>>>> this assumption wrong?
>>>>>>
>>>>>> I've increased the ES_HEAP_SIZE to 6G before, with the same results. 
>>>>>>
>>>>>> Chris...
>>>>>>
>>>>>> Am Montag, 9. Februar 2015 20:30:28 UTC+1 schrieb Arie:
>>>>>>>
>>>>>>> Hi,,
>>>>>>>
>>>>>>> Looking @ your config in elasticsearch.yml the follwing comes in to 
>>>>>>> mind
>>>>>>>
>>>>>>> One node should be:
>>>>>>> node.master: true
>>>>>>> node.data: true
>>>>>>>  
>>>>>>> and for the other two nodes:
>>>>>>> node.master: false
>>>>>>> node.data: false
>>>>>>>
>>>>>>> elasticseaarch.conf
>>>>>>> ES_HEAP_SIZE
>>>>>>>
>>>>>>> you can take this easy up o 8G (50% of your memory) and check if 
>>>>>>> this is really
>>>>>>> running so. In my case on Centos6 I put this in 
>>>>>>> /etc/conf.d/elasticseaarch
>>>>>>>
>>>>>>> Good luck.
>>>>>>>
>>>>>>> On Friday, February 6, 2015 at 12:58:27 PM UTC+1, Christoph 
>>>>>>> Fürstaller wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> Yesterday we've updatet our Graylog2/Elasticsearch Cluster. The 
>>>>>>>> Elastic Search Cluster consists of 3 physical Maschins: DL380 G7, 
>>>>>>>> E5620, 
>>>>>>>> 16GB RAM on RHEL 6.6. Each ES Node gets 4GB RAM. On one Host there is 
>>>>>>>> the 
>>>>>>>> graylog2 Server/Interface installed. Until yesterday we used 
>>>>>>>> Elasticsearch 
>>>>>>>> 0.90.10-1 and graylog2-0.20.3 Yesterday we updatet graylog2 to 0.90.0, 
>>>>>>>> startet everything, everything was running fine. Then Stopped graylog2 
>>>>>>>> and 
>>>>>>>> the ElaticSearch Cluster, upgraded ES to 1.3.4 and graylog to 0.92.4. 
>>>>>>>> The 
>>>>>>>> Upgrade from ES was successfully, after that, startet graylog2, which 
>>>>>>>> connected to the cluster and showed everything.
>>>>>>>>
>>>>>>>> In the ES Cluster there are 7 indices a 20mio messages. The last 3 
>>>>>>>> indices are opened, the other closed. Graylog2 sees approx 50mio 
>>>>>>>> messages. 
>>>>>>>> New messages arrive with approx 5msg/sec
>>>>>>>>
>>>>>>>> In the logs from graylog2-server there are messages like this, 
>>>>>>>> every couple of minutes:
>>>>>>>> org.graylog2.periodical.GarbageCollectionWarningThread - Last GC 
>>>>>>>> run with PS Scavenge took longer than 1 second
>>>>>>>>
>>>>>>>> It seems graylog is running fine, a bit slow on searches, but fine.
>>>>>>>>
>>>>>>>> Attached are the config files for graylog2 and elasticsearch.
>>>>>>>>
>>>>>>>> Can someone give us a hint where this warnings come from? What we 
>>>>>>>> can tweak? Would be very helpful!
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Chris...
>>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Last GC run with PS Scavenge took longer than 1 second

Reply via email to