We actually just rolled out our own heka monitoring, so I figured I'd share
the solution we came up with:

https://gist.github.com/nathwill/7f5058e92e9026b945f1

The idea is that the pulse filter emits heartbeat events once/minute, which
the ekg output logs to a file; monit watches that file, and if it hasn't
updated in 15 minutes, considers heka to be wedged and issues a restart.

On Thu, Feb 4, 2016 at 3:56 AM Ramin Ali Dousti <dou...@gmail.com> wrote:

> Thanks for the insight Rob, very helpful.
>
> On Tue, Feb 2, 2016 at 2:15 PM, Rob Miller <rmil...@mozilla.com> wrote:
>
>> The simplest way to monitor heka-1 would probably be to set up a trivial
>> filter on heka-1 that periodically emits a heartbeat message to be
>> delivered to heka-2, and to set up a heartbeat filter on heka-2 that will
>> generate an alert message if the heartbeat fails. I've unfortunately just
>> realized that, while we have a heartbeat filter with nice documentation,
>> the docs for the filter were never correctly wired in to our Sphinx docs so
>> it's not showing up on the readthedocs site. I'll fix that, but in the
>> meantime you can see the heartbeat filter and how to use it here:
>>
>>
>> https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/lua/filters/heartbeat.lua
>>
>> As for monitoring heka-2, you'll probably want to set up a trivial filter
>> on heka-2 that emits a heartbeat message, as before, but you'll need to
>> have something outside of heka-2 listening for those messages and alerting
>> if they stop coming. If you're already using some tool that can serve that
>> purpose, great. If not, you'll need to set one up. You of course could
>> always use a *third* heka instance (we'll call it heka-3), with a heartbeat
>> filter listening for heka-2's heartbeats, in the same way that heka-2 would
>> be listening for heka-1's. You might then want to monitor heka-3, though,
>> which leads us to turtles all the way down. Or maybe you trust that if
>> heka-3 is set up to do absolutely *nothing* other than listen for heka-2's
>> heartbeats, then process level monitoring is good enough, since the chance
>> of it getting wedged when doing absolutely nothing is vanishingly small.
>>
>> Hope this helps,
>>
>> -r
>>
>>
>>
>> On 01/14/2016 05:59 AM, Ramin Ali Dousti wrote:
>>
>>> Hi,
>>>
>>> What is the best way of monitoring the proper working of a heka
>>> instance? Let me give a concrete example: I have heka-1 that has log
>>> files as input and TCP outputs to another heka instance. The second one
>>> would receive the stream from the first one and publishes to elastic
>>> search as well as influxdb. The simplest monitoring is the process
>>> monitoring, making sure hekad is up, but what can be done (what is the
>>> correct way) to make sure that heka-1 can send to heka-2 and heka-2 is
>>> able to publish to elastic search and influxdb?
>>>
>>> As always any insight is greatly appreciated.
>>>
>>> --
>>> Ramin
>>>
>>>
>>> _______________________________________________
>>> Heka mailing list
>>> Heka@mozilla.org
>>> https://mail.mozilla.org/listinfo/heka
>>>
>>>
>
>
> --
> Ramin
> _______________________________________________
> Heka mailing list
> Heka@mozilla.org
> https://mail.mozilla.org/listinfo/heka
>
_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

Reply via email to