I've put notes on what you've got here in-line below:
On 03/17/2015 10:44 AM, Hank Beatty wrote:
> Hello,
Hi!
> I'm trying to build some graphs from an ATS Server log file (custom
> format). Here is a copy of the config that I came up with:
>
> [ats_access_logs]
> type="LogstreamerInput"
> splitter = "TokenSplitter"
> decoder = "ATS_transform_decoder"
> log_directory = "/var/log/ats"
> file_match = "custom_ats_psp6cdatsec04.log"
`file_match` is a regular expression value, so you'll save yourself some
cycles by escaping the period in your filename, either using
"custom_ats_psp6cdatsec04\\.log" or, even better, switching to single
quotes for TOML raw strings: 'custom_ats_psp6cdatsec04\.log'.
> [ATS_transform_decoder]
> type = "PayloadRegexDecoder"
> match_regex = '(?P<UnixTimestamp>[\d]+\.[\d]+) chi=(?P<chi>\S+)
> phn=(?P<phn>\S+) shn=(?P<shn>\S+) url=(?P<url>\S+) cqhm=(?P<cqhm>\w+)
> cqhv=(?P<cqhv>\S+) pssc=(?P<pssc>\d+) ttms=(?P<ttms>\d+) b=(?P<b>\d+)
> sssc=(?P<sssc>\d+) sscl=(?P<sscl>\d+) cfsc=(?P<cfsc>\S+)
> pfsc=(?P<pfsc>\S+) crc=(?P<crc>\S+) phr=(?P<phr>\S+) uas=(?P<uas>\S+)'
> #timestamp_layout= 'Dec 14 07:57:35'
>
> [ATS_transform_decoder.message_fields]
> Type = "ats_access"
> host = "%phn%"
> shn = "%shn%"
> clientip = "%chi%"
> Timestamp = "%UnixTimestamp%"
> useragent = "%uas%"
> uri = "%url%"
> method = "%cqhm%"
> status = "%pssc%"
> crc = "%crc%"
> phr = "%phr%"
> version = "%cqhv%"
> request_duration = "%ttms%"
We definitely recommend using Lua and LPEG for this type of parsing job.
It's much more composable, will perform a lot better, and provides a lot
more flexibility than the PayloadRegexDecoder. You can help test out an
LPEG grammer using our grammar tester (http://lpeg.trink.com/). There's
a tutorial for converting regex to LPEG in the Heka wiki
(https://github.com/mozilla-services/heka/wiki/How-to-convert-a-PayloadRegex-MultiDecoder-to-a-SandboxDecoder-using-an-LPeg-Grammar).
Also, if you find us in the #heka channel on irc.mozilla.org, we're
often able to help folks with debugging their grammars.
> [ATSServer]
> type = "SandboxFilter"
> filename = "lua_filters/ats_graph.lua"
> ticker_interval = 60
> preserve_data = true
> message_matcher = "Fields['Type'] == 'ats_access'"
>
> [ATSServer.config]
> sec_per_row = 60
> rows = 1440
> # anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false)
> roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP
> Status", 5, 15, 10, 0.8)'
> preservation_version = 1
>
> [DashboardOutput]
> ticker_interval = 60
>
> I took the http_graph.lua and modified it slightly to fit the above. I
> added some debug lines to the ats_graph.lua:
> table.insert(dbg, "Exiting function process_message")
>
> inject_payload ("txt", "debug", table.concat(dbg, "\n"))
>
> I can't find where these are being written.
Any output from a SandboxFilter should show up in the DashboardOutput.
Clicking on the "Sandboxes" link should show you all of the sandbox
filters and their outputs, clicking on your filter's "debug" output
should show you what's being generated. Is this not happening?
> From what I can tell the messages aren't making it to the
> SandboxFilter. I'm thinking the message_matcher is wrong?
That may very well be. The best way to debug a message_matcher is to set
up a LogOutput (or, if the volume is too high, a FileOutput) using the
same message_matcher value to see what you're getting. The output should
use an RstEncoder, which will spit out a restructured text rendering of
the entire message for every message that the matcher catches.
If you do this and find that the message_matcher isn't catching the
messages you want, you can try a more general message_matcher, all the
way up to `message_matcher = "TRUE"`, which will catch everything. Once