Replying to myself to clarify one of my notes, below...
On 03/17/2015 12:52 PM, Rob Miller wrote:
I've put notes on what you've got here in-line below: On 03/17/2015 10:44 AM, Hank Beatty wrote: > Hello, Hi! > I'm trying to build some graphs from an ATS Server log file (custom > format). Here is a copy of the config that I came up with: > > [ats_access_logs] > type="LogstreamerInput" > splitter = "TokenSplitter" > decoder = "ATS_transform_decoder" > log_directory = "/var/log/ats" > file_match = "custom_ats_psp6cdatsec04.log" `file_match` is a regular expression value, so you'll save yourself some cycles by escaping the period in your filename, either using "custom_ats_psp6cdatsec04\\.log" or, even better, switching to single quotes for TOML raw strings: 'custom_ats_psp6cdatsec04\.log'. > [ATS_transform_decoder] > type = "PayloadRegexDecoder" > match_regex = '(?P<UnixTimestamp>[\d]+\.[\d]+) chi=(?P<chi>\S+) > phn=(?P<phn>\S+) shn=(?P<shn>\S+) url=(?P<url>\S+) cqhm=(?P<cqhm>\w+) > cqhv=(?P<cqhv>\S+) pssc=(?P<pssc>\d+) ttms=(?P<ttms>\d+) b=(?P<b>\d+) > sssc=(?P<sssc>\d+) sscl=(?P<sscl>\d+) cfsc=(?P<cfsc>\S+) > pfsc=(?P<pfsc>\S+) crc=(?P<crc>\S+) phr=(?P<phr>\S+) uas=(?P<uas>\S+)' > #timestamp_layout= 'Dec 14 07:57:35' > > [ATS_transform_decoder.message_fields] > Type = "ats_access" > host = "%phn%" > shn = "%shn%" > clientip = "%chi%" > Timestamp = "%UnixTimestamp%" > useragent = "%uas%" > uri = "%url%" > method = "%cqhm%" > status = "%pssc%" > crc = "%crc%" > phr = "%phr%" > version = "%cqhv%" > request_duration = "%ttms%" We definitely recommend using Lua and LPEG for this type of parsing job. It's much more composable, will perform a lot better, and provides a lot more flexibility than the PayloadRegexDecoder. You can help test out an LPEG grammer using our grammar tester (http://lpeg.trink.com/). There's a tutorial for converting regex to LPEG in the Heka wiki (https://github.com/mozilla-services/heka/wiki/How-to-convert-a-PayloadRegex-MultiDecoder-to-a-SandboxDecoder-using-an-LPeg-Grammar). Also, if you find us in the #heka channel on irc.mozilla.org, we're often able to help folks with debugging their grammars. > [ATSServer] > type = "SandboxFilter" > filename = "lua_filters/ats_graph.lua" > ticker_interval = 60 > preserve_data = true > message_matcher = "Fields['Type'] == 'ats_access'" > > [ATSServer.config] > sec_per_row = 60 > rows = 1440 > # anomaly_config = 'roc("HTTP Status", 2, 15, 0, 1.5, true, false) > roc("HTTP Status", 4, 15, 0, 1.5, true, false) mww_nonparametric("HTTP > Status", 5, 15, 10, 0.8)' > preservation_version = 1 > > [DashboardOutput] > ticker_interval = 60 > > I took the http_graph.lua and modified it slightly to fit the above. I > added some debug lines to the ats_graph.lua: > table.insert(dbg, "Exiting function process_message") > > inject_payload ("txt", "debug", table.concat(dbg, "\n")) > > I can't find where these are being written. Any output from a SandboxFilter should show up in the DashboardOutput. Clicking on the "Sandboxes" link should show you all of the sandbox filters and their outputs, clicking on your filter's "debug" output should show you what's being generated. Is this not happening? > From what I can tell the messages aren't making it to the > SandboxFilter. I'm thinking the message_matcher is wrong? That may very well be. The best way to debug a message_matcher is to set up a LogOutput (or, if the volume is too high, a FileOutput) using the same message_matcher value to see what you're getting. The output should use an RstEncoder, which will spit out a restructured text rendering of the entire message for every message that the matcher catches. If you do this and find that the message_matcher isn't catching the messages you want, you can try a more general message_matcher, all the way up to `message_matcher = "TRUE"`, which will catch everything. Once
My colleague pointed out that this was a bit ambiguous... When I say to try a more general message_matcher, all the way up to "TRUE", I mean for the LogOutput, as a means of debugging. You *can't* use "TRUE" as the message_matcher on a SandboxFilter, because that will lead to the filter shutting itself down to prevent infinite routing loops when the filter tries to inject a message. The idea here is to use the LogOutput to refine the message_matcher until you're catching only the messages that you want the filter to get, then you can apply that matcher to the filter and remove the LogOutput from your config altogether. -r _______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

