Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

John Mon, 27 Jun 2016 05:33:08 -0700

Hi
I checked the elasticsearch log and I don't see something special
The cluster status is green


this is the last log file

2016-06-26_09:51:28.78352 [2016-06-26 12:51:28,782][INFO ][node             
        ] [Glenn Talbot] version[2.3.1], pid[953], 
build[bd98092/2016-04-04T12:25:05Z]
2016-06-26_09:51:28.79783 [2016-06-26 12:51:28,794][INFO ][node             
        ] [Glenn Talbot] initializing ...
2016-06-26_09:51:30.17146 [2016-06-26 12:51:30,171][INFO ][plugins         
         ] [Glenn Talbot] modules [reindex, lang-expression, lang-groovy], 
plugins [kopf], sites [kopf]
2016-06-26_09:51:30.29289 [2016-06-26 12:51:30,292][INFO ][env             
         ] [Glenn Talbot] using [1] data paths, mounts [[/ 
(/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
[14.9gb], spins? [possibly], types [ext4]
2016-06-26_09:51:30.29564 [2016-06-26 12:51:30,294][INFO ][env             
         ] [Glenn Talbot] heap size [37.6gb], compressed ordinary object 
pointers [false]
2016-06-26_09:51:30.29766 [2016-06-26 12:51:30,294][WARN ][env             
         ] [Glenn Talbot] max file descriptors [64000] for elasticsearch 
process likely too low, consider increasing to at least [65536]
2016-06-26_09:51:34.69050 [2016-06-26 12:51:34,690][INFO ][node             
        ] [Glenn Talbot] initialized
2016-06-26_09:51:34.69107 [2016-06-26 12:51:34,690][INFO ][node             
        ] [Glenn Talbot] starting ...
2016-06-26_09:51:35.32863 [2016-06-26 12:51:35,328][INFO ][transport       
         ] [Glenn Talbot] publish_address {172.25.232.45:9300}, 
bound_addresses {172.25.232.45:9300}
2016-06-26_09:51:35.33658 [2016-06-26 12:51:35,336][INFO ][discovery       
         ] [Glenn Talbot] graylog-production/th7wM-a9ThaAY_umCV3v2w
2016-06-26_09:51:45.37933 [2016-06-26 12:51:45,379][INFO ][cluster.service 
         ] [Glenn Talbot] new_master {Glenn 
Talbot}{th7wM-a9ThaAY_umCV3v2w}{172.25.232.45}{172.25.232.45:9300}, added 
{{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
 
data=false, 
master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
joins received)
2016-06-26_09:51:45.40239 [2016-06-26 12:51:45,402][INFO ][http             
        ] [Glenn Talbot] publish_address {172.25.232.45:9200}, 
bound_addresses {172.25.232.45:9200}
2016-06-26_09:51:45.40350 [2016-06-26 12:51:45,403][INFO ][node             
        ] [Glenn Talbot] started
2016-06-26_09:51:45.53808 [2016-06-26 12:51:45,537][INFO ][gateway         
         ] [Glenn Talbot] recovered [1] indices into cluster_state
2016-06-26_09:51:45.87525 [2016-06-26 12:51:45,875][INFO 
][cluster.routing.allocation] [Glenn Talbot] Cluster health status changed 
from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
2016-06-26_09:57:01.91281 [2016-06-26 12:57:01,912][INFO ][cluster.service 
         ] [Glenn Talbot] added 
{{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(join from 
node[{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, master=false}])
2016-06-26_10:17:02.43148 [2016-06-26 13:17:02,428][INFO ][cluster.metadata 
        ] [Glenn Talbot] [graylog_0] update_mapping [message]
2016-06-26_15:35:13.25159 [2016-06-26 18:35:13,250][INFO ][node             
        ] [Glenn Talbot] stopping ...
2016-06-26_15:35:13.32027 [2016-06-26 18:35:13,319][INFO ][node             
        ] [Glenn Talbot] stopped
2016-06-26_15:35:13.32153 [2016-06-26 18:35:13,320][INFO ][node             
        ] [Glenn Talbot] closing ...
2016-06-26_15:35:13.33032 [2016-06-26 18:35:13,329][INFO ][node             
        ] [Glenn Talbot] closed
2016-06-26_15:46:49.97957 [2016-06-26 18:46:49,977][INFO ][node             
        ] [Tether] version[2.3.1], pid[1364], 
build[bd98092/2016-04-04T12:25:05Z]
2016-06-26_15:46:49.97959 [2016-06-26 18:46:49,978][INFO ][node             
        ] [Tether] initializing ...
2016-06-26_15:46:50.52052 [2016-06-26 18:46:50,519][INFO ][plugins         
         ] [Tether] modules [reindex, lang-expression, lang-groovy], 
plugins [kopf], sites [kopf]
2016-06-26_15:46:50.54693 [2016-06-26 18:46:50,546][INFO ][env             
         ] [Tether] using [1] data paths, mounts [[/ 
(/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
[14.9gb], spins? [possibly], types [ext4]
2016-06-26_15:46:50.54734 [2016-06-26 18:46:50,546][INFO ][env             
         ] [Tether] heap size [37.6gb], compressed ordinary object pointers 
[false]
2016-06-26_15:46:50.54871 [2016-06-26 18:46:50,547][WARN ][env             
         ] [Tether] max file descriptors [64000] for elasticsearch process 
likely too low, consider increasing to at least [65536]
2016-06-26_15:46:53.00370 [2016-06-26 18:46:53,003][INFO ][node             
        ] [Tether] initialized
2016-06-26_15:46:53.00560 [2016-06-26 18:46:53,003][INFO ][node             
        ] [Tether] starting ...
2016-06-26_15:46:54.29760 [2016-06-26 18:46:54,297][INFO ][transport       
         ] [Tether] publish_address {172.25.232.45:9300}, bound_addresses 
{172.25.232.45:9300}
2016-06-26_15:46:54.30807 [2016-06-26 18:46:54,307][INFO ][discovery       
         ] [Tether] graylog-production/kMz-P-dcQZCObsEMIXZtxQ
2016-06-26_15:47:04.35293 [2016-06-26 18:47:04,352][INFO ][cluster.service 
         ] [Tether] new_master 
{Tether}{kMz-P-dcQZCObsEMIXZtxQ}{172.25.232.45}{172.25.232.45:9300}, added 
{{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
 
data=false, 
master=false},{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, 
master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
joins received)
2016-06-26_15:47:04.36292 [2016-06-26 
18:47:04,362][DEBUG][action.admin.cluster.state] [Tether] no known master 
node, scheduling a retry
2016-06-26_15:47:04.36567 [2016-06-26 
18:47:04,364][DEBUG][action.admin.cluster.health] [Tether] no known master 
node, scheduling a retry
2016-06-26_15:47:04.38578 [2016-06-26 18:47:04,385][INFO ][http             
        ] [Tether] publish_address {172.25.232.45:9200}, bound_addresses 
{172.25.232.45:9200}
2016-06-26_15:47:04.38617 [2016-06-26 18:47:04,385][INFO ][node             
        ] [Tether] started
2016-06-26_15:47:04.42204 [2016-06-26 18:47:04,421][INFO ][gateway         
         ] [Tether] recovered [1] indices into cluster_state
2016-06-26_15:47:04.76029 [2016-06-26 18:47:04,759][INFO 
][cluster.routing.allocation] [Tether] Cluster health status changed from 
[RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).



בתאריך יום שני, 27 ביוני 2016 בשעה 14:53:20 UTC+3, מאת Marius Sturm:
>
> Hi,
> this all boils down to an unstable Elasticsearch instance. When Graylog is 
> not able to forward log messages to ES it buffers them on disk and tries to 
> send them later. This is called journal.
> So when your ES service is not running properly the journal fills up with 
> messages. Please take a look into the ES logs to figure out why it has 
> problems with message ingestion. You can find them in 
> /var/log/graylog/elasticsearch/current
>
> Cheers,
> Marius
>
>
> On 27 June 2016 at 13:39, John <yonatha...@gmail.com <javascript:>> wrote:
>
>> 1 and 4
>> and the graylog server node is not sending data to elasticsearch
>> I deleted the journal but it doesn't help
>> the problems began few days after I upgraded from 1.3 to 2.0.2
>>
>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:30:28 UTC+3, מאת Joe K:
>>
>>> Which problem out of 4?
>>>
>>>
>>> On Monday, June 27, 2016 at 2:00:14 PM UTC+3, John wrote:
>>>>
>>>> Hi Joe
>>>> I have exactly the same problem few days after I upgraded from 1.3 to 
>>>> 2.0.2
>>>> Did you managed to fix this issue?
>>>>
>>>> בתאריך יום חמישי, 26 במאי 2016 בשעה 14:02:19 UTC+3, מאת Joe K:
>>>>>
>>>>>
>>>>> - We run it on t2.medium. (4GB RAM, 2 cores)
>>>>> - About 1 incoming message per second.
>>>>> - tried 2.0.0 and now running 2.0.1
>>>>>
>>>>> Anyone use Image in real world application? Graylog 2.0 image fails 
>>>>> after few days. Is this Image problem or Graylog in general?
>>>>>
>>>>> It runs fine for about a week. After that there's errors and search 
>>>>> stop working. Search requests timeout.
>>>>> There's many errors and they are very cryptic, google search does not 
>>>>> give any solutions how to manage them:
>>>>>
>>>>>
>>>>> *1. After about a week we have error "Uncommited messages deleted from 
>>>>> journal"*
>>>>>
>>>>>> Uncommited messages deleted from journal (triggered 9 days ago)
>>>>>> Some messages were deleted from the Graylog journal before they could 
>>>>>> be written to Elasticsearch. Please verify that your Elasticsearch 
>>>>>> cluster 
>>>>>> is healthy and fast enough. You may also want to review your Graylog 
>>>>>> journal settings and set a higher limit. (Node: f12...
>>>>>
>>>>>
>>>>> What to do about this? What is "journal"? Google search produce no 
>>>>> answers.
>>>>>
>>>>> *2. After about 4 days of clean install it always trigger "Cluster 
>>>>> unhealthy"*
>>>>>
>>>>>>  "Elasticsearch cluster unhealthy (RED)"
>>>>>> "The Elasticsearch cluster state is RED which means shards are 
>>>>>> unassigned. This usually indicates a crashed and corrupt cluster and 
>>>>>> needs 
>>>>>> to be investigated. Graylog will write into the local disk journal. Read 
>>>>>> how to fix this in the Elasticsearch setup documentation."
>>>>>
>>>>>
>>>>> When you go to that documentation link it says "The red status 
>>>>> indicates that some or all of the primary shards are not available. In 
>>>>> this 
>>>>> state, no searches can be performed until all primary shards are 
>>>>> restored."
>>>>> That's it. what are you supposed to do?
>>>>> After long search finally found one solution: this was cured once with 
>>>>> *curl 
>>>>> -XPUT 'localhost:9200/_settings' -d '{ "index" : {       
>>>>>  "number_of_replicas" : 0}}'*
>>>>> Next time it happened, we tried the solution again, but response was 
>>>>> *{"acknowledged":false}*
>>>>> So what now???
>>>>>
>>>>> *3. Every time we perform graylog-ctl restart four more unassigled 
>>>>> shards appear:*
>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing, 0
>>>>>  relocating, 8 unassigned
>>>>> graylog-ctl restart
>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing, 0
>>>>>  relocating, 12 unassigned
>>>>> Etc.
>>>>>
>>>>>
>>>>>
>>>>> *4. Journal utilization is too high without any hint on how to set it 
>>>>> to higher.*
>>>>>
>>>>>>  Journal utilization is too high (triggered 11 days ago)
>>>>>> Journal utilization is too high and may go over the limit soon. 
>>>>>> Please verify that your Elasticsearch cluster is healthy and fast 
>>>>>> enough. 
>>>>>> You may also want to review your Graylog journal settings and set a 
>>>>>> higher 
>>>>>> limit. (Node: f121
>>>>>
>>>>>
>>>>> What is this "journal"? and how to set it to "higher"?
>>>>>
>>>>> Please help!
>>>>>
>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Graylog Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to graylog2+u...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Developer
>
> Tel.: +49 (0)40 609 452 077
> Fax.: +49 (0)40 609 452 078
>
> TORCH GmbH - A Graylog Company
> Poolstraße 21
> 20335 Hamburg
> Germany
>
> https://www.graylog.com <https://www.torch.sh/>
>
> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
> Geschäftsführer: Lennart Koopmann (CEO)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/8078499e-d96b-4086-ba8a-7863380c0405%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

Reply via email to