I will let you know ;)

On Sunday, December 11, 2016 at 11:01:34 PM UTC-6, BKeep wrote:
> No one has any ideas?
> On Friday, December 9, 2016 at 12:17:17 PM UTC-6, BKeep wrote:
>> It happened again today. However, for whatever reason, stopping and then 
>> starting the graylog-server.service did not allow operations to continue 
>> when graylog came back up. I had to disconnect the pipeline processor. 
>> Once I did that, logs started flowing to elasticsearch again.
>> Without running anything through the pipeline processor this is the load 
>> on the system. 
>> top - 12:12:22 up 20 days, 19:51,  1 user,  load average: 1.35, 2.60, 2.17
>> Tasks: 420 total,   1 running, 419 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  2.1 us,  0.8 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  
>> 0.0 st
>> KiB Mem : 49282804 total, 36668764 free,  8659352 used,  3954688 
>> buff/cache
>> KiB Swap:  1048572 total,  1048572 free,        0 used. 40362516 avail Mem
>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
>> 18329 graylog   20   0 22.554g 6.058g  70608 S 161.5 12.9  28:45.28 java
>>  1905 elastic+  20   0 29.972g 1.941g  15520 S  17.3  4.1 220:31.90 java
>>  2030 mongod    20   0  444924  57660   7512 S   1.3  0.1 263:08.27 mongod
>> I'm open to suggestions.
>> Regards,
>> Brandon
>> On Wednesday, December 7, 2016 at 9:52:31 PM UTC-6, BKeep wrote:
>>> Our new Graylog instance has been running for awhile without so much as 
>>> a hiccup. Recently, I added new log sources from a Security Onion sensor 
>>> containing BRO and Suricata logs. It doesn't appear these new inputs have 
>>> caused any noticeable load on the system, at least not until I run them 
>>> through a pipeline processor. I am using grok and standard regex functions 
>>> in the pipeline rules to parse out bro_conn, bro_dns, etc.
>>> Today, I noticed the output stopped during what I would consider peak 
>>> load around 11:30a<ish> CST and to the best of my recollection no changes 
>>> had been made directly proceeding the stoppage.
>>> As I have been adding pipeline rules to parse out messages, it seems 
>>> something happens where logs stop writing to the elasticsearch nodes. I 
>>> don't see anything in the server.log that looks like a smoking gun If I 
>>> restart the graylog-server.service the logs will not begin to clear from 
>>> the output buffer. However, If I stop the graylog-server.service and then 
>>> start it, logs begin to flow again. I do not have to restart any other 
>>> service after the manual stop/start of graylog.
>>> The only log I see that seems like it would be related is below. 
>>> However, I am not sure if it is relevant.
>>> 2016-12-07T12:27:53.601-06:00 WARN  [DeadEventLoggingListener] Received 
>>> unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from 
>>> event bus <AsyncEventBus{graylog-eventbus}>
>>> System info:
>>> top - 15:07:03 up 18 days, 22:46,  1 user,  load average: 0.68, 1.06, 
>>> 1.12
>>> Tasks: 420 total,   1 running, 419 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s):  5.3 us,  1.6 sy,  0.0 ni, 93.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
>>> 0.0 st
>>> KiB Mem : 49282804 total, 34329916 free, 11754760 used,  3198128 
>>> buff/cache
>>> KiB Swap:  1048572 total,  1048572 free,        0 used. 37276428 avail 
>>> Mem
>>> Graylog 2.1.1+01d50e5 starting up
>>> JRE: 1.8.0_111 on Linux 3.10.0-327.36.3.el7.x86_64
>>> OS: CentOS Linux 7 (Core) amd64
>>> JVM arguments: -Xms8g -Xmx8g -XX:NewRatio=1 -XX:+ResizeTLAB 
>>> -XX:+UseConcMarkSweepGC -XX:+CMSConc currentMTEnabled 
>>> -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC 
>>> -XX:-OmitStackTraceInFastThrow 
>>> -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml 
>>> -Djava.library.path=/opt/graylog-server/lib/sigar 
>>> -Dgraylog2.installation_source=unknown
>>> transparent_hugepage=false
>>> I noticed when checking messages against the pipeline simulator, I see 
>>> inconsistent results in the execution times. I see this from the same 
>>> messages and I also see it with different messages at different times so it 
>>> isn't something I can reproduce at will.
>>> Sample message:
>>> <13>1 2016-12-08T02:36:57+00:00 BROsensor bro_notice - - - 
>>> 1481164616.440593|CYxMwXBbL0TvQQKm9||51841||443|-|-|-|tcp|SSL::Invalid_Server_Cert|SSL
>>> certificate validation failed with (self signed certificate in certificate 
>>> chain)|
>>> ,CN=ADN,OU=ADN,O=External,L=Toledo,ST=Ohio,C=US|||443|-|BROsensor-eth1-1|Notice::ACTION_LOG|3600.000000|F|-|-|-|-|-
>>> Sample rule using grok:
>>> rule "Extract bro_notice log fields"
>>> when
>>>   has_field("message") AND
>>>   contains(value: to_string($message.application_name), search: 
>>> "bro_notice", ignore_case: true)
>>> then
>>>     let m = grok(
>>> "^(?<ts>%{NUMBER}|-).?(?<uid>%{WORD}|-).?(?<id_orig_h>%{IP}|-).?(?<id_orig_p>%{INT}|-).?(?<id_resp_h>%{IP}|-).?(?<id_resp_p>%{INT}|-).?(?<fuid>%{WORD}|-).?(?<file_mime_type>%{WORD}/%{WORD}|-).?(?<file_desc>%{WORD}|-).?(?<proto>%{WORD}|-).?(?<note>%{WORD}::%{WORD}|-).?(?<msg>%{BRO_URL}).?(?<sub>%{BRO_URL}).?(?<src>%{IP}|-).?(?<dst>%{IP}|-).?(?<p>%{INT}|-).?(?<n>%{INT}|-).?(?<peer_desc>%{HOSTNAME}|-).?(?<actions>%{WORD}::%{WORD}|-).?(?<suppress_for>%{INT}\\.%{INT}|-).?(?<dropped>[TF]).?(?<remote_location_country_code>%{WORD}|-).?(?<remote_location_region>%{WORD}|-).?(?<remote_location_city>%{WORD}|-).?(?<remote_location_latitude>%{INT}\\.%{INT}|-).?(?<remote_location_longitude>%{INT}\\.%{INT}|-).?$"
>>> , to_string($message.message), true);
>>>   set_fields(m);
>>> end
>>> Simulation results test 1:
>>> These are the results of processing the loaded message. Processing took 
>>> 54,366 µs.
>>> 1 μs
>>>     Starting message processing
>>> 51 μs
>>>     Message c93a5361-bcee-11e6-9154-78e7d17bef2e running [Pipeline 'Bro 
>>> IDS Logs' (58464b367d5d445614cc4f27)] for streams [
>>> 584195567d5d44076c4160f7]
>>> 84 μs
>>>     Enter Stage 0
>>> 90 μs
>>>     Evaluate Rule 'Extract bro_conn log fields' (
>>> 584190b57d5d44076c415c14) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 105 μs
>>>     Evaluation not satisfied Rule 'Extract bro_conn log fields' (
>>> 584190b57d5d44076c415c14) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 107 μs
>>>     Evaluate Rule 'Extract bro_dns log fields' (5843002a7d5d44076c42de64
>>> ) in Pipeline 'Bro IDS Logs' (58464b367d5d445614cc4f27)
>>> 116 μs
>>>     Evaluation not satisfied Rule 'Extract bro_dns log fields' (
>>> 5843002a7d5d44076c42de64) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 119 μs
>>>     Evaluate Rule 'Extract bro_http log fields' (
>>> 584372dc7d5d443ea323761e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 126 μs
>>>     Evaluation not satisfied Rule 'Extract bro_http log fields' (
>>> 584372dc7d5d443ea323761e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 129 μs
>>>     Evaluate Rule 'Extract bro_tunnels log fields' (
>>> 58476fb77d5d445614cd828a) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 136 μs
>>>     Evaluation not satisfied Rule 'Extract bro_tunnels log fields' (
>>> 58476fb77d5d445614cd828a) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 138 μs
>>>     Evaluate Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 147 μs
>>>     Evaluation satisfied Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 150 μs
>>>     Evaluate Rule 'Extract bro_weird log fields' (
>>> 5847a07f7d5d445614cdb5e2) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 157 μs
>>>     Evaluation not satisfied Rule 'Extract bro_weird log fields' (
>>> 5847a07f7d5d445614cdb5e2) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 161 μs
>>>     Evaluate Rule 'Extract bro_software log fields' (
>>> 5847b6017d5d445614cdcc85) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 168 μs
>>>     Evaluation not satisfied Rule 'Extract bro_software log fields' (
>>> 5847b6017d5d445614cdcc85) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 179 μs
>>>     Execute Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 54,335 μs
>>>     Completed Stage 0 for Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27), continuing to next stage
>>> 54,345 μs
>>>     Exit Stage 0
>>> 54,366 μs
>>>     Finished message processing
>>> Simulation results test 2:
>>> These are the results of processing the loaded message. Processing took 
>>> 381 µs.
>>> 3 μs
>>>     Starting message processing
>>> 36 μs
>>>     Message ead67e30-bcef-11e6-9154-78e7d17bef2e running [Pipeline 'Bro 
>>> IDS Logs' (58464b367d5d445614cc4f27)] for streams [
>>> 584195567d5d44076c4160f7]
>>> 62 μs
>>>     Enter Stage 0
>>> 67 μs
>>>     Evaluate Rule 'Extract bro_conn log fields' (
>>> 584190b57d5d44076c415c14) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 80 μs
>>>     Evaluation not satisfied Rule 'Extract bro_conn log fields' (
>>> 584190b57d5d44076c415c14) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 82 μs
>>>     Evaluate Rule 'Extract bro_dns log fields' (5843002a7d5d44076c42de64
>>> ) in Pipeline 'Bro IDS Logs' (58464b367d5d445614cc4f27)
>>> 88 μs
>>>     Evaluation not satisfied Rule 'Extract bro_dns log fields' (
>>> 5843002a7d5d44076c42de64) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 91 μs
>>>     Evaluate Rule 'Extract bro_http log fields' (
>>> 584372dc7d5d443ea323761e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 97 μs
>>>     Evaluation not satisfied Rule 'Extract bro_http log fields' (
>>> 584372dc7d5d443ea323761e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 100 μs
>>>     Evaluate Rule 'Extract bro_tunnels log fields' (
>>> 58476fb77d5d445614cd828a) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 106 μs
>>>     Evaluation not satisfied Rule 'Extract bro_tunnels log fields' (
>>> 58476fb77d5d445614cd828a) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 109 μs
>>>     Evaluate Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 116 μs
>>>     Evaluation satisfied Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 120 μs
>>>     Evaluate Rule 'Extract bro_weird log fields' (
>>> 5847a07f7d5d445614cdb5e2) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 134 μs
>>>     Evaluation not satisfied Rule 'Extract bro_weird log fields' (
>>> 5847a07f7d5d445614cdb5e2) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 136 μs
>>>     Evaluate Rule 'Extract bro_software log fields' (
>>> 5847b6017d5d445614cdcc85) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 142 μs
>>>     Evaluation not satisfied Rule 'Extract bro_software log fields' (
>>> 5847b6017d5d445614cdcc85) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 148 μs
>>>     Execute Rule 'Extract bro_notice log fields' (
>>> 584795807d5d445614cdaa4e) in Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27)
>>> 369 μs
>>>     Completed Stage 0 for Pipeline 'Bro IDS Logs' (
>>> 58464b367d5d445614cc4f27), continuing to next stage
>>> 372 μs
>>>     Exit Stage 0
>>> 381 μs
>>>     Finished message processing
>>> I am open to suggestions for where to look next.
>>> Regards,
>>> Brandon

