Nevermind. It seems the problem was that heka process was killed at some 
point... so the logstreamerinput's file anchor was somewhere in the middle of 
the message. Without resetting (sudo rm /var/cache/hekad/* -rf) the pointers 
next time I run hekad it started off where it stopped - in the middle of the 
message.

From: Heka [mailto:[email protected]] On Behalf Of Gołębiewski Piotr 
DRA-BRB-ZIS
Sent: Friday, July 17, 2015 11:19 AM
To: [email protected]
Subject: [heka] [RegexSplitter] not splitting data into correct payloads?

Hi,

I've got some logs in format:

yyyy-MM-ddTHH:mm:ssZ (Some header #1)
{below multiline message body, eg. PHP array dump}

yyyy-MM-ddTHH:mm:ssZ (Some header #2)
{below multiline message body, eg. PHP array dump}

Example:

2015-01-02T13:14:15+01:00 DB QUERY:
SELECT id, name
    FROM table
    WHERE id = 15
Namespace\Class:method /path/to/file/calleing/log/Class:123

2015-01-02T13:14:15+01:00 DB QUERY:
SELECT id, somethingelse
  FROM table2
 WHERE somethingelse = 'foobar'
Namespace\Class:method /path/to/file/calleing/log/Class:123

Another Example (diffrent log, similar problem):

2015-01-02T13:14:15+01:00 REQUEST -> http://123.45.67.89:0123
array (
    0 => 'get_something_action',
    1 =>
   array (
        'params' =>
        array (
            'someparam' => 'foo',
            'otherparam' => 'bar,
       )
    )
)

2015-01-02T13:14:15+01:00 RESPONSE <- http://123.45.67.89:0123
array (
    'state' => 'ok',
    'params' =>
    array (
        'some_return_param' => 'foo_bar_baz'
    )
)

I am useing RegexSplitter with delimiter configured like this:

*         delimiter = '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} DB 
QUERY:)\n'

*         delimiter = '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} \S+ 
[<-][->] \S+.*)\n'

*         delimiter = '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} 
.+)\n'
(Diffrent delimiters are for diffrent logs, but their structure is the same - 
the only diffrence is the "header" following the timestamp).

When I run hekad (with PayloadEncoder/LogOutput) I am getting alerts:
(note: empty lines = successful messages, becouse I have the "payload_keep" 
param set to false)
(note2: failed parsing empty payload is also OK - it means the log file begins 
with an empty line, which is not a valid message)

Example output:

2015/07/17 10:44:51
2015/07/17 10:44:51
2015/07/17 10:44:51 Decoder 'AcmeNLogInput-AcmeNLogDecoder-23' error: Failed 
parsing:  payload:
2015/07/17 10:44:51 Decoder 'AcmeDBLogInput-AcmeDBLogDecoder-6' error: Failed 
parsing:  payload: 15-04-12T15:06:45+02:00 DB QUERY:
select * from session where token = :token
{":token":"foobar"}
Acme\Database::isSessionExist /path/to/src/Acme/Database.php:123

2015/07/17 10:44:51
2015/07/17 10:44:51 Decoder 'AcmeDBLogInput-AcmeDBLogDecoder-5' error: Failed 
parsing:  payload: _name, hash
                   FROM devices
                   WHERE device_hash = :hash AND
                         client_name = :name AND
                         ip_address = :ip
{":ip":"123.45.67.89",":hash":"foobarbaz",":name":"ACME-01-NYC"}
Acme\Database::getDevice /path/to/src/Acme/Database.php:456

2015/07/17 10:44:51
2015/07/17 10:44:51 Decoder 'AcmeDBLogInput-AcmeDBLogDecoder-14' error: Failed 
parsing:  payload: arams /path/to/src/Acme/Database.php:234

2015/07/17 10:44:51
2015/07/17 10:44:51 Decoder 'AcmeDBLogInput-AcmeDBLogDecoder-9' error: Failed 
parsing:  payload: _hash
                   FROM devices
                   WHERE device_hash = :hash AND
                         client_name = :name AND
                         ip_address = :ip
{":ip":"98.76.54.32",":hash":"barfoobaz",":name":"ACME-01-NYC"}
Acme\Database::getDevice /path/to/src/Acme/Database.php:456


My question is:

*         why does the payload start with "arams /path/to/src/Acme/Database 
(...)" or "_hash (...)" or "15-04-12T15:06:45+02:00 DB QUERY (...)"?
it should ALWAYS start with yyyy-MM-ddTHH:mm:ss {Some Header} - all decoders 
start with timestamp pattern and `delimiter_eol` is set to false (so it should 
prepend the payload)

*         I've found these messages in my log files, and they are no diffrent 
(in format/pattern) from others
It seems like sometimes (?) the payload (for some reason) gets stripped?

My hekad config:
[hekad]
max_message_size = 15728640    # 15MB
maxprocs = 4
poolsize = 10

Example Splitter config:
[AcmeMainLogSplitter]
type = "RegexSplitter"
delimiter = '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2} DEVICE:.*)\n'
delimiter_eol = false

Example Decoder config:
[AcmeMainLogDecoder]
type = "SandboxDecoder"
filename = "/etc/heka/lua_decoders/acme_main_log.lua"
memory_limit = 15728640    # 15MB
output_limit = 15728640    # 15MB
instruction_limit = 1500000  # 1.5 million


Any ideas why this happens?
---------------------------------------------
Siedziba: Getin Noble Bank SA, ul. Przyokopowa 33, 01-208 Warszawa
Sad rejestrowy: Sad Rejonowy dla m.st. Warszawy w Warszawie, XII Wydzial 
Gospodarczy.
Numer KRS: 0000304735
NIP: 108-000-48-50
Wysokosc kapitalu zakladowego oplaconego w calosci: 2 650 143 319,00 zl
Zamieszczenie powyzszych danych identyfikujacych Getin Noble Bank SA stosownie 
do art. 374 par.1 Kodeksu spolek handlowych nie jest rownoznaczne z handlowym 
charakterem dostarczonej do Panstwa wiadomosci e-mailowej i pozostaje bez 
wplywu na interpretacje zawartych w niej oswiadczen.


Niniejszy e-mail oraz wszelkie zalaczone do niego pliki sa poufne i moga 
podlegac ochronie prawnej. Jezeli nie jest Pan/Pani zamierzonym adresatem 
powyzszej wiadomosci, nie moze jej Pan/Pani ujawniac, kopiowac, dystrybuowac, 
ani tez w zaden inny sposob udostepniac lub wykorzystywac. O blednym 
zaadresowaniu wiadomosci prosimy niezwlocznie poinformowac nadawce i usunac 
wiadomosc.


This e-mail message may contain confidential and/or privileged information. If 
you are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.

---------------------------------------------
Siedziba: Getin Noble Bank SA, ul. Przyokopowa 33, 01-208 Warszawa
Sad rejestrowy: Sad Rejonowy dla m.st. Warszawy w Warszawie, XII Wydzial 
Gospodarczy.
Numer KRS: 0000304735
NIP: 108-000-48-50
Wysokosc kapitalu zakladowego oplaconego w calosci: 2 650 143 319,00 zl
Zamieszczenie powyzszych danych identyfikujacych Getin Noble Bank SA stosownie 
do art. 374  par.1 Kodeksu spolek handlowych nie jest rownoznaczne z handlowym 
charakterem dostarczonej do Panstwa
wiadomosci e-mailowej i pozostaje bez wplywu na interpretacje zawartych w niej 
oswiadczen.**********
Niniejszy e-mail oraz wszelkie zalaczone do niego pliki sa poufne i moga 
podlegac ochronie prawnej. Jezeli nie jest Pan/Pani zamierzonym adresatem 
powyzszej wiadomosci, nie moze jej Pan/Pani ujawniac, kopiowac, dystrybuowac, 
ani tez w zaden inny sposob udostepniac lub wykorzystywac. O blednym 
zaadresowaniu wiadomosci prosimy niezwlocznie poinformowac nadawce i usunac 
wiadomosc.**********
This e-mail message may contain confidential and/or privileged information. If 
you are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to