We actually just rolled out our own heka monitoring, so I figured I'd share the solution we came up with:
https://gist.github.com/nathwill/7f5058e92e9026b945f1 The idea is that the pulse filter emits heartbeat events once/minute, which the ekg output logs to a file; monit watches that file, and if it hasn't updated in 15 minutes, considers heka to be wedged and issues a restart. On Thu, Feb 4, 2016 at 3:56 AM Ramin Ali Dousti <dou...@gmail.com> wrote: > Thanks for the insight Rob, very helpful. > > On Tue, Feb 2, 2016 at 2:15 PM, Rob Miller <rmil...@mozilla.com> wrote: > >> The simplest way to monitor heka-1 would probably be to set up a trivial >> filter on heka-1 that periodically emits a heartbeat message to be >> delivered to heka-2, and to set up a heartbeat filter on heka-2 that will >> generate an alert message if the heartbeat fails. I've unfortunately just >> realized that, while we have a heartbeat filter with nice documentation, >> the docs for the filter were never correctly wired in to our Sphinx docs so >> it's not showing up on the readthedocs site. I'll fix that, but in the >> meantime you can see the heartbeat filter and how to use it here: >> >> >> https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/lua/filters/heartbeat.lua >> >> As for monitoring heka-2, you'll probably want to set up a trivial filter >> on heka-2 that emits a heartbeat message, as before, but you'll need to >> have something outside of heka-2 listening for those messages and alerting >> if they stop coming. If you're already using some tool that can serve that >> purpose, great. If not, you'll need to set one up. You of course could >> always use a *third* heka instance (we'll call it heka-3), with a heartbeat >> filter listening for heka-2's heartbeats, in the same way that heka-2 would >> be listening for heka-1's. You might then want to monitor heka-3, though, >> which leads us to turtles all the way down. Or maybe you trust that if >> heka-3 is set up to do absolutely *nothing* other than listen for heka-2's >> heartbeats, then process level monitoring is good enough, since the chance >> of it getting wedged when doing absolutely nothing is vanishingly small. >> >> Hope this helps, >> >> -r >> >> >> >> On 01/14/2016 05:59 AM, Ramin Ali Dousti wrote: >> >>> Hi, >>> >>> What is the best way of monitoring the proper working of a heka >>> instance? Let me give a concrete example: I have heka-1 that has log >>> files as input and TCP outputs to another heka instance. The second one >>> would receive the stream from the first one and publishes to elastic >>> search as well as influxdb. The simplest monitoring is the process >>> monitoring, making sure hekad is up, but what can be done (what is the >>> correct way) to make sure that heka-1 can send to heka-2 and heka-2 is >>> able to publish to elastic search and influxdb? >>> >>> As always any insight is greatly appreciated. >>> >>> -- >>> Ramin >>> >>> >>> _______________________________________________ >>> Heka mailing list >>> Heka@mozilla.org >>> https://mail.mozilla.org/listinfo/heka >>> >>> > > > -- > Ramin > _______________________________________________ > Heka mailing list > Heka@mozilla.org > https://mail.mozilla.org/listinfo/heka >
_______________________________________________ Heka mailing list Heka@mozilla.org https://mail.mozilla.org/listinfo/heka